Can I know how to merge two tables using SPSSINC MERGE TABLES by
rows/columns?
I have tried using the following syntax but the output (orange cells) is not what I want
:
SPSSINC MERGE TABLES MATCHLABEL="Column N %" ATTACH=ROWS
MODE=MERGE
/OPTIONS HIDE=YES APPENDTITLE=YES APPENDCAPTION=YES ADDLABELLEAF=YES
HALIGN=RIGHT SEPARATOR="\n".
The format that I want is to (green cells):
Include (A) (B) ... after the third row (header)
Include the sig category in next row (instead of combining with Column N
%)
Is there any ways to do that? Thanks a lot!
Related
I have more than 100 columns in dataprep whose names are like:
my column name 1
my column name 2
I would like to rename the name of the columns to be:
my_column_name_1
my_column_name_2
I have tried to do a rename, changing " " by "_". However, dataprep only changes the first whitespace! Is there any way to change all the whitespaces?
Another question, when I do a function like rename, it is done just for a column. I can add more columns writing the name of de column. Is there any way to select all columns without writing all the names?
thank you so much!
You can shift-select multiple columns to Transform when the data is in column view mode.
Select the columns to apply to and then choose the transformation.
JSDBroughton answer did the trick for me although it's not so clear how to do it. Change your view to Columns (second icon from the left on the toolbar). Select the first column, then hold Shift and select the last column. You should now have all columns selected. Then right clock and select Rename. A new Recipe step will be added with all your columns already added. Then set the Option to "Find and replace".
In terms removing all the spaces I couldn't find any Cloud Dataprep pattern or Regular Expression which let me replace all my spaces in my columns. Having said that my columns had a maximum of 4 spaces so I simply added the same step multiple times. I used the Regular Expression \s to match spaces and I replaced them underscores.
I've got this type of data in my Database. Imagine that File_Name is the column name and so I need to take all the rows (Under "File_name") and put them into different columns with different Names.
File_Name (Column Name)
File1 (First Row)
File2 (Second Row)
File3 (Third Row)
And I need to put them in another file like this:
File_Name1 (Column Name1) ,File_Name2 (Column Name2), File_Name3 (Column Name3)
File1 (Under First column), File2 (Under Second Column), File3 (Under Third column)
Is there a stage that can help me? I tried using the Pivot but I can't really figure how to set it with just one input column.
So assuming you just want a single result row from your input (that is what I understood from your question) I would use a Transformer (or Column Generator) to add an artificial column with a value of 1 for all rows.
You tried already with the Pivot Enterprise stage and with that additional column it will be possible to transform it into the result you need.
I am working in Cloud Dataprep and i have a case like this:
Basically I need to create new rows in column 2 based on how many rows there is with matching data in column 1.
Is it possible and how?
I understand that the scenario you want to have is: obtain all values from column1 that match a value present in column2. There are many things to consider in this scenario, which you did not describe, such as: can values in column2 be repeated? or if there is a value in column2 missing in column1, what should happen? or what happens the other way around?
However, as a general approach to this issue, I would do the following flow:
With a flow such as this one, you take the input table, which as two columns like this:
In recipes FIRST_COLUMN and SECOND_COLUMN you split both columns into different branches, and do the necessary steps to clean each column. In column1, I understand nothing is needed to be done. In column2, I understand that you will have to remove duplicates (again, this is my guessing, but it would depend on your specific implementation, which you have not completely described) and delete empty values. You can do that applying the following transforms:
Finally, you can join both columns together. Depending on your needs (only values present in both columns should appear, only values present in columnX should appear, etc.) you should apply a different JOIN strategy. You should use a Join key like column1 = column2 (as in the image), and if you choose only the second column in the left-side menu, you will have a single-column result.
Note that in this case I used an Inner-join, but using other JOIN types will provide completely different results. Use the one that fits your requirements better.
I am trying to achieve column merge of files in a folder using Talend.(Files are local)
Example:- 4 files are there in a folder. ( there could be 'n' number of files also)
Each file would have one column having 100 values.
So after merge, the output file would have 4 or 'n' number of columns with 100 records in it.
Is it possible to merge this way using Talend components ?
Tried with 2 files in tmap , the output records becomes multiplied ( the record in first file * the record in second file ).
Any help would be appreciated.
Thanks.
You have to determine how to join data from the different files.
If row number N of each file has to be matched with row number N of the other files, then you must set a sequence on each of your file, and join the sequences in order to get your result. Careful, you are totally depending on the order of data in each file.
Then you can have this job :
tFileInputdelimited_1 --> tMap_1 --->{tMap_5
tFileInputdelimited_2 --> tMap_2 --->{tMap_5
tFileInputdelimited_3 --> tMap_3 --->{tMap_5
tFileInputdelimited_4 --> tMap_4 --->{tMap_5
In tMaps from 1 to 4, copy the input to the output, and add a "sequence" column (datatype integer) to your output, populate it with Numeric.sequence("IDENTIFIER1",1,1) . Then you have 2 columns in output : your data and a unique sequence.
Be careful to use different identifiers for each source.
Then in tMap_5, just join the different sequences, and get your inputColumn.
This question is something that a lot of people learning bioinformatics and new to DNA data analysis are struggling with:
Lets say I have 20 tables with the same column headings. Each table represents a patient sample and each row represents a locus (site) which has mutated in that sample. Each site is uniquely identified by two columns together - chromosome number and base number (eg. 1 and 43535, 1 and 33456, 1 and 3454353). There are several columns which give different characteristics of each mutation including a column called Gene which gives the gene at that site.. Multiple sites can be mutated in a gene - meaning the Gene column can have the same value multiple times in one table.
I want to query all these tables at the same time by lets say Gene. I input a value from the Gene column and I want as output the names of all the tables (samples) in which the gene name is present in the Gene column and also the entire line(s) (preferably) for each sample so that I can compare the characteristics of the mutation in that gene across multiple samples on one output page.
I also want to input a number say 4 and want as output a list of genes which have mutated in at least 4 of 20 patients (list of genes whose names appear in the Gene column in atleast 4 of 20 tables).
What is the "easiest way" to do this? What is the "best way" assuming I want to make more flexible queries, besides these two?
I am a MD, do not have any particular software expertise but I am willing to put in the necessary time to build this query system. A few lines of code won't put me off..
Eg data:
Func Gene ExonicFunc Chr Start End Ref Obs
exonic ACTRT2 nonsynonymous SNV 1 2939346 2939346 G A
exonic EIF4G3 nonsynonymous SNV 1 21226201 21226201 G A
exonic CSMD2 nonsynonymous SNV 1 34123714 34123714 C T
This is just a third of the columns. Multiple columns were removed to fit the page size here...
Thank you.
Create a view that union's all the tables together. You should probably add additional information about which table ti comes from:
create view allpatients as
select 'a' as whichtable, t.*
from tableA t
union all
select 'b' as whichtable, t.*
from tableB t
...
You might find that it is easier to "instantiate" the view by creating a table with all patients. Just have a stored procedure that recreates the table by combining the 20 tables.
Alternatively, you could find that you have large individual tables (millions of rows). In this case, you would want to treat each of the original tables as a partition.
If what you have is a bunch of Excel files, you can import them all into the same table, with a distinct column for patient id. There is no need to create 20 different tables for this -- in fact, it would be a bad idea.
Once you do, go to Access' query design, SQL view and use these queries:
To create a query that returns all fields for the input gene name:
select *
from gene_data
where gene = [GeneName]
To create a query that returns gene names that are mutated in more than 4 samples:
select gene
from
(select gene, sample_id
from gene_data
group by gene, sample_id) g
group by gene
having count(sample_id) > 4
After this, change to design view -- you'll see how to create similar queries using the GUI.