Using PowerShell to extract data from a large tab-separated text file, masked it and then merge the masked data back to the original file - powershell

I am newbie to Windows PowerShell and wanted to know if is it possible to use PowerShell to extract specific data from tab-delimited(.dat) file and merge it back together to the original file.
The reason behind the extraction of data is that they are sensitive data and requires masking.
Upon extraction, I would require to mask the data and after masking, would require to merge this masked data again back to its original file on their specific places.
Please provide some pointers, any kind of help would be appreciated.
Thank you in advance.

Solution
Here's a solution based on my limited understanding of your question (if you add more details I may be able to be more specific)
Code
Seems all you need to do is read all the data, modify and write it to the file, so here it is!
$Columns = 2,4 # Columns to mask out (Indexes start from 0)
cat ./lol.dat | % {
$arr = $_.split("`t")
$Columns | % {$arr[$_] = '*'*$arr[$_].length}
$arr.join("`t")
} | Out-File ./lol.dat

Related

Is there any limit to the length of text content that a PowerShell variable can hold?

I am storing the content of a text file in a variable like this -
$fileContent=$(Get-Content file1.txt)
Right now file1.txt contains 200 lines only. But if one day the file contains 10 million lines, then will this approach work? Is there any limit to the length of content that a variable can hold in PowerShell?
Get-Content reads the file into memory.
With that being said, you'd want to change the approach on what you're after. PowerShell being built on top of the .Net framework has access to all of its capabilities. So, you can use classes such as StreamReader which reads the file from disk one line at a time using a method like the one below.
$file = [System.IO.StreamReader]::new('.\Desktop\adobe_export.reg') #instantiate an istance of streamreader
while ($file.EndOfStream.Equals($false)) #if not end of file, continue.
{
# save this to a variable if needed
$file.ReadLine() # read/display line
# more code
}
$file.Close()
$file.Dispose()
First of all, you need to understand that a PS variable is a wrapper around a .NET type, so whatever that can hold, is the answer.
Regarding your actual case, you can search in Microsoft docs whatever GetType() returns, if there is a limit for that type - but there is always a memory limit. So if you read a lot of data into memory, and then return some of it after filtering/transforming/completing/whatever, you are filling memory. Instead you may NOT assign anything to a variable, but use the pipeline's one-at-a-time processing functionality, with this only that much memory is used for the items in the pipeline. Of course you might need to do more than one complex thing with the same input that need their own pipelines, but in this case you can either re-read the data, or if you think that it can change between reads and you need a snapshot, then copy it into a temporary place.

Octave: create .csv files with varying file names stored in a sub folder

I have multiple arrays with string data. All of them should be exported into a .csv file. The file should be saved in a subfolder. The file name is variable.
I used the code as follows:
fpath = ('./Subfolder/');
m_date = inputdlg('Date of measurement [yyyymmdd_exp]');
m_name = inputdlg('Characteristic name of the expteriment');
fformat = ('.csv');
fullstring = strcat(fpath, m_date,'_', m_name, fformat);
dlmwrite(fullstring,measurement);
However, I get an error that FILE must be a filename string or numeric FID
What's the reason?
Best
Andreas
What you are asking to do is fairly straightforward for Matlab or Octave. The first part is creating a file with a filename that changes. the best way to do this is by concatenating the strings to build the one you want.
You can use: fullstring = strcat('string1','string2')
Or specifically: filenameandpath = strcat('./Subfolder/FixedFileName_',fname)
note that because strings are pretty much just character arrays, you can also just use:
fullstring = ['string1','string2']
now, if you want to create CSV data, you'll first have to read in the file, possibly parse the data in some way, then save it. As Andy mentioned above you may just be able to use dlmwrite to create the output file. We'll need to see a sample of the string data to have an idea whether any more work would need to be done before dlmwrite could handle it.

Reading all the files in sequence in MATLAB

I am trying to read all the images in the folder in MATLAB using this code
flst=dir(str_Expfold);
But it shows me output like this. which is not the sequence as i want.
Can anyone please tell me how can i read all of them in sequence?
for giving downmark, please explain the reason for that too.
By alphabetical order depth10 comes before depth2. If at all possible, when creating string + num type filenames, use a fixed width numerical part (e.g. depth01, depth02) - this tends to avoid sorting problems.
If you are stuck with the filenames you have, and know the filename pattern, though, you can not bother using dir at all and create your filename list in the correct order in the first place:
for n = 1:50
fname = sprintf('depth%d.png',n);
% code to read and process images goes here
end
From the Matlab forums, the dir command output sorting is not specified, but it seems to be purely alphabetical order (with purely I mean that it does not take into account sorter filenames first). Therefore, you would have to manually sort the names. The following code is taken from this link (you probably want to change the file extension):
list = dir(fullfile(cd, '*.mat'));
name = {list.name};
str = sprintf('%s#', name{:});
num = sscanf(str, 'r_%d.mat#');
[dummy, index] = sort(num);
name = name(index);

Matlab publish - Want to use a custom file name to publish several pdf files

I have several data log files (here: 34) for those I have to calculate some certain values. I wrote a seperate function to publish the results of the calculation in a pdf file. But I only can publish one file after another, so it takes a while to publish all 34 files.
Now I want to automize that with a loop - importing the data, calculate the values and publish the results for every log file in a new pdf file. I want 34 pdf files for every log file at the end.
My problem is, that I couldn't find a way to rename the pdf files during publishing. The pdf file is always named after the script which is calculating the values. Obviously the pdf is overwritten within a loop. So at the end everything is calculated, but I only have the pdf from the last calculated log file.
There was this hacky solution to change the Matlab publish script, but since I don't have admin rights I can't use that:
"This is really hacky, but I would modify publish to accept a new option prefix. Replace line 93
[scriptDir,prefix] = fileparts(fullPathToScript);
with
if ~isfield(options, 'prefix')
[scriptDir,prefix] = fileparts(fullPathToScript);
else
[scriptDir,~] = fileparts(fullPathToScript);
prefix = options.prefix; end
Now you can set options.prefix to whatever filename you want. If you want to be really hardcore, make the appropriate modifications to supplyDefaultOptions and checkOptionFields as well."
Any suggestions?
Thanks in advance,
Martin
Here's one idea using movefile to rename the resultant published PDF on each iteration:
for i = 1:34
file = publish(files(i)); % Replace with your own command(s)
[pathStr,fileName,ext] = fileparts(file);
newFile = [pathStr filesep() fileName '_' int2str(i) ext]; % Example: append _# to each
[success,msg,msgid] = movefile(file,newFile);
if ~success
error(msgid,msg);
end
end
Also used are fileparts and filesep. See this question for other ways to rename and move files.

Data separation by particular rows in Matlab

I am relatively new to using Matlab and I don't have much knowledge about programming either. For a project I am working on currently I need to process a lot of data which is logged using the following format.
$GPRMC,202124.985,V,,,,,,,091112,,,N*44
2038,4674,4667,5593,3379
2087,5133,5111,6084,3372
2138,5134,5114,6080,3376
2188,5133,5114,6084,3377
2238,5130,5113,6084,3410
2287,5134,5113,6080,3416
2337,5133,5110,6080,3417
2387,5133,5110,6084,3416
2438,5130,5113,6081,3396
2487,5132,5110,6080,3410
$GPRMC,202125.985,V,,,,,,,091112,,,N*45
2985,5130,5113,6085,3988
3035,5130,5118,6084,4541
3085,5138,5113,6082,5186
3135,5130,5114,6081,6001
3185,5134,5110,6084,6311
3234,5134,5113,6084,6319
3284,5131,5114,6084,6316
3339,5131,5110,6084,6260
3389,5130,5114,6080,6178
3438,5134,5110,6085,6077
$GPRMC,202126.985,V,,,,,,,091112,,,N*46
3942,5131,5114,6085,5916
3992,5130,5110,6084,5917
4042,5133,5110,6084,5950
4091,5131,5114,6080,5996
4142,5134,5114,6085,6062
4192,5134,5114,6084,6129
4242,5134,5110,6080,6150
4291,5130,5110,6079,6186
4341,5130,5110,6089,6246
4391,5130,5118,6083,6266
It continues like this until the end of the file. What I want to do is to be able to separate the data such that, all the '$GPRMC' strings (rows) are listed together as text (not separated) in one file or array while all the other rows (numerical) listed together in one file array (comma separated is desirable). Is it even possible? If it is than can you please give me some pointers?
Not quite sure what you mean by separated or not separated. If you copy the text you posted into some file like testf.dat, a simple script like this using fopen, fprintf, and fgets might be what you're looking for:
infile = fopen('testf.dat');
outf1 = fopen('GPRMC.dat','w');
outf2 = fopen('nums.dat','w');
tline = fgets(infile);
while ischar(tline)
if tline(1:6) == '$GPRMC'
fprintf(outf1,tline);
else
fprintf(outf2,tline);
end
tline = fgets(infile);
end
fclose(infile);
fclose(outf1);
fclose(outf2);