Cloud Dataprep - Multiply rows in one column based on values in other column - google-cloud-dataprep

I am working in Cloud Dataprep and i have a case like this:
Basically I need to create new rows in column 2 based on how many rows there is with matching data in column 1.
Is it possible and how?

I understand that the scenario you want to have is: obtain all values from column1 that match a value present in column2. There are many things to consider in this scenario, which you did not describe, such as: can values in column2 be repeated? or if there is a value in column2 missing in column1, what should happen? or what happens the other way around?
However, as a general approach to this issue, I would do the following flow:
With a flow such as this one, you take the input table, which as two columns like this:
In recipes FIRST_COLUMN and SECOND_COLUMN you split both columns into different branches, and do the necessary steps to clean each column. In column1, I understand nothing is needed to be done. In column2, I understand that you will have to remove duplicates (again, this is my guessing, but it would depend on your specific implementation, which you have not completely described) and delete empty values. You can do that applying the following transforms:
Finally, you can join both columns together. Depending on your needs (only values present in both columns should appear, only values present in columnX should appear, etc.) you should apply a different JOIN strategy. You should use a Join key like column1 = column2 (as in the image), and if you choose only the second column in the left-side menu, you will have a single-column result.
Note that in this case I used an Inner-join, but using other JOIN types will provide completely different results. Use the one that fits your requirements better.

Related

Aminoacid screening library in Knime

I have a task to create tetrapeptide screening library aminoacids using Knime. I have never used Knime before sadly. I need to create a workflow with all 20 aminoacids, multiply it with another 20, then multiply the result with another 20 and repeat to get final result of tetrapeptides. Can someone suggest me how to input aminoacids on the Knime? Thank you very much!
Use a Table Creator node to enter the Amino acid single-letter codes, one per table. Now use a Cross Joiner node to cross-join the table to itself - you should now have a table with rows like:
A|A
A|C
etc.
Now put this table into both inputs of a second Cross Joiner node, which should give you now quite a long table starting something like:
A|A|A|A
A|A|A|C
A|C|A|A
A|C|A|C
etc.
Now use a Column Aggregator node, select all column as aggregation columns, the aggregation method as Concatenate and change the delimiter to an empty string:
and:
This will give you a table with a single column, 'Peptide':
AAAA
AAAC
ACAA
ACAC
etc.
If you want the output as a chemical structure, then as of v1.36.0 the Vernalis community contribution contains a node Speedy Sequence to SMILES which will convert the sequence to a SMILES string (make sure you select the option that your input column is a Protein!)
The full workflow is as shown:

how to multiply variable to each element of a column in database

I am trying to add a column to a collection by multiplying the 0.9 to existing database column recycling. but I get a run time error.
I tried to multiply 0.9 direction in the function but it is showing error, so I created the class and multiplied it there yet no use. what could be the problem?
Your error message is telling you what the problem is: your database query is using GROUP BY in an invalid way.
It doesn't make sense to group by one column and then select other columns (you've selected all columns in your case); what values would they contain, since you haven't grouped by them as well (and get one row returned per group)? You either have to group by all the columns you're selecting for, and/or use aggregates such as SUM for the non-grouped columns.
Perhaps you meant to ORDER BY that column (orderBy(dt.recycling.asc()) if ascending order in QueryDSL format), or to select all rows with a particular value of that column (where(dt.recycling.eq(55)) for example)?

can't figure out how to assign column from Dim table to measures group

DW is written in foreign language so i translated it on pictures.
It is about some facility where you can rent rooms and equipment.
In Fact table i have one measure column which represents number of reservations and it works as expected, problem is in DimPayment table i have columns PaymentSum and PaymentSumWDiscount which i would need to use as measures. i tried to do something in calculations tab but i only get null values
I can't figure out how to use columns PaymentSum and PaymentSumWDiscount
as measures
i tried to do something in calculations tab but i only get null values
JOIN the FactReserving table to the DimPayment table in the Data Source View and then you can access those two columns in the Measure group.
It looks like you want to use your Dimension attributes as measure.
One way to achieve this is to make your Dimension table as a Measure Group as well.
The following example works with the AdventureWorks sample to create a measure group for the Product table:
In Visual Studio / SSDT, open your Cube, create a new Measure Group and select your table. (for your scenario, it would be the DimPayment table).
Your dimension usage mapping for this would be 'Fact' as both your fact and dimension are in the same table.
Now, you should be able to use the measures normally.
All the numeric fields automatically added as measures:
Using it as a measure:

tableau show categories from calculation even when a category is not visible

I have a calculation and it outputs multiple values. Then I am creating a table on those values. For example, in below data my formula is
if data is 1 then calculation is `one`
if data is 2 then calculation is `two`
if data is 3 then calculation is `three`
as three doesn't really appear in the output, when I create a table, three is not displayed. Is there any way to display it?
I tried table layout >> show empty rows and columns and it didn't work
data calculation
1 one
2 two
Tableau discovers the possible values for a dimension field dynamically from the query results.
If ‘three’ does not appear in your data, then how do you expect Tableau to know to make a column header for that non existent, but potential, value? It can’t read your mind.
This situation does occur often though - perhaps you want row or column headers to remain stable, even when you change filters in a way that causes some to no longer appear in the query results.
There are a few ways you can force Tableau to pad ** or **complete a domain:
one solution is to pad your data to make sure each value for your dimension field appears in at least one data row.
You can often do this easily by using a union to append some extra rows to your original data. You can often add padding rows that don’t impact any results by leaving all your Measure columns null since nulls are ignored by aggregation functions
Another common solution that is a bit more effort is to make what is known as scaffolding data source that is not much more than a list of your dimension members. You can then use that data source as a primary data source with data blending, making your original data source secondary.
There are two situations where Tableau can detect the absence of data and leave space for it in the visualization automatically
for numeric types, you can create a bin field that will automatically pad for missing bins
similarly, date fields can show missing values because, like bins, Tableau can tell when a month doesn’t appear in the data and leave room for it in the view

Merge all the data in the second column for each unique value in the first column

I have two columns of data. Some of the data in the first column repeats (they represent questions). The data in the second column is unique (they represent multiple answers to the same question).
I need to merge all the data in the second column for each unique value in the first column. e.g.:
Q,A
1,yes.
1,is possible.
2,no.
2,not possible.
2,cannot do this.
2,impossible.
3,maybe.
merged to:
Q,A
1,yes.is possible.
2,no.not possible.cannot do this.impossible.
3,maybe.
Something like this is crude but may be adequate:
=IF(A1=A2,C1&B2,B2)
copied down to suit. Then select the last entry (identifiable with something like =A1=A2 copied down to suit) for each Question number.
Questions in column A sorted in order
Answers in column B
In C1 use =B1
In C2 use =if(a2=A1,C1&B2,B2)
Drag down formula in C2.
It will keep adding the lines together as long as the question remains the same. When it gets to a new question, it'll start a new string. The last time each question is listed will be the complete string in column C.
Create a 2 column project in Google Refine
Sort by Q column (if not already sorted) and make sort permanent
Blank Down on Q column to remove duplicate values
On A column, do Edit Cells -> Merge multi-valued cells