how to find the difference between a csv file and a file containing only one column of this csv - diff

I have a CSV file containing some user data it looks like this:
"10333","","an.10","Kenyata","","Aaron","","","","","","","","","",""
"12222","","an.4","Wendy","","Aaron","","","","","","","","","",""
"14343","","aaron.5","Nanci","","Aaron","","","","","","","","","",""
I also have a file which has an item on each line like this:
an.10
arron.5
What I want is to find only the lines in the CSV file contained in the list file.
So desired output would be:
"10333","","an.10","Kenyata","","Aaron","","","","","","","","","",""
"14343","","aaron.5","Nanci","","Aaron","","","","","","","","","",""
(Note how an.4 is not contained in this new list.)
I have any environment available to me and am willing to try just about anything aside from manually doing so as this csv contains millions of records and there are about 100k entries in the list itself.

How unique are the identifiers an.10 and the like?
Maybe a very small *x shell script would be enough:
for i in $(uniq list.txt); do grep "\"$i\"" data.csv; done
That would, for every unique entry in the list, return all matching lines in the csv file. It does not match exclusively on the second column however. (That could be done with awk for example)

If the csv file is data.csv and the list file is list.txt, I would do this:
for i in `cat list.txt`; do grep $i data.csv; done

Related

Run for-loop only if there is at least one json file

I want to iterate over all json files in a specific subdirectory.
#!/bin/sh
source_dir="nodes"
for file_path in $source_dir/*.json;
do
file_name=$(basename $file_path .${file_path##*.})
echo $file_name
done
My code is working as expected if there is at least one json file in the directory.
If there is no json file in the directory, the loop will still be executed. The file_name is then "*".
How do I have to change the for loop so that it is only executed if there is at least one json file in the directory?
you can wrap you loop in an if clause to check if the pattern matches anything.
Check this SO question on how to do this: Check if a file exists with wildcard in shell script

How do I extract the last string of a csv file and append it to the other?

I have csv file of many rows, each having 101 columns, with the 101th column being a char, while the rest of the columns are doubles. Eg.
1,-2.2,3 ... 98,99,100,N
I implemented a filter to operate on the numbers and wrote the result in a different file, but now I need to map the last column of my old csv to my new csv. how should I approach this?
I did the original loading using loadcsv but that didn't seem to load the character so how should I proceed?
In MATLAB there are many ways to do it, this answer expands on the use of tables:
Input
test.csv
1,2,5,A
2,3,5,G
5,6,8,C
8,9,7,T
test2.csv
1,2,1.2
2,3,8
5,6,56
8,9,3
Script
t1 = readtable('test.csv'); % Read the csv file
lastcol = t{:,end}; % Extract the last column
t2 = readtable('test2.csv'); % Read the second csv file
t2.addedvar = lastcol; % Add the last column of the first file to the table from the second file
writetable(t2,'test3.csv','Delimiter',',','WriteVariableNames',false) % write the new table in a file
Note that test3.csv is a new file but you could also overwrite test2.csv
'WriteVariableNames',false allows you to write the csv file without the headers of the table.
Output
test3.csv
1,2,1.2,A
2,3,8,G
5,6,56,C
8,9,3,T

Splitting on column into multiple coloums from a CSV file in powershell

I am new to using powershell and I am in need of some assistance.
I have a csv file that looks like this:
DisplayName,AllJSSUSers,ALLMobileDevices,LimitToUsers,Exclusions,DepartmentEx,IconURL,ID
Aurasma,TRUE,TRUE,"G_Year 4,G_Year 7,G_Year 11,G_Year 6,G_Year 10,G_Year 5,G_Year 9,G_Teaching Staff,G_Year 8,G_Supply Teachers,G_Year 3,G_Year 12",,,,5
What I would like to do is split the column "LimitToUsers" where the commas are into multiple column and then output that to a new csv file.
I have no idea where to start with this. Can anyone help?
Thank you
Gavin
You can read CSV data with Import-Csv.
You can access that column from each data object by accessing the LimitToUsers property.
You can split a string with the -split operator.
You can add new properties to object with Add-Member.
You can write CSV with Export-Csv.
Since you somehow have to split a single column into multiple ones, how you do that is up to you and I can't help you there

How to place a line which exists in one perl file , to another txt file after particular string matches

I am writing one perl script which is having some if else conditions. There is another .txt file in which, I want to place that conditional statements which exists in if else (in perl file) after a certain string. i did some search for this but most of the programs are based on merging two files. But in my case one file is perl file itself in which conditional statements exist and other is text file in which I want to append that conditional statements after a certain string. My files look like-
File 1
If (n==1 && m==1){
print (".include xyz.txt")}
else if(n==1 && m==0){
print (".include abc.txt")}.....
File 2
lines....
lines....
*matching string
Here I want to append #.include xyz.txt
lines....
lines....
Can both files run simultaneously and my conditional statements can be added in another file? Or first I have to take output from file 1 in other output file then to append it in second file. Please help me out. Thanks
Using perl from command line,
perl -MFcntl=:seek -pe 'seek(ARGV,0,SEEK_END) if /match/ and !$c++' fil1 fil2
It skips to fil2 file when it finds string match within fil1, and !$c++ ensures that skipping occurs only once.

How can I copy columns from several files into the same output file using Perl

This is my problem.
I need to copy 2 columns each from 7 different files to the same output file.
All input and output files are CSV files.
And I need to add each new pair of columns beside the columns that have already been copied, so that at the end the output file has 14 columns.
I believe I cannot use
open(FILEHANDLE,">>file.csv").
Also all 7 CSV files have nearlly 20,000 rows each, therefore I'm reading and writing the files line by line.
It would be a great help if you could give me an idea as to what I should do.
Thanx a lot in advance.
Provided that your lines are 1:1 (Meaning you're combining data from line 1 of File_1, File_2, etc):
open all 7 files for input
open output file
read line of data from all input files
write line of combined data to output file
Text::CSV is probably the way to access CSV files.
You could define a csv handler for each file (including output), use getline or getline_hr (returns hashref) methods to fetch data, combine it into arrayrefs, than use print.