How to import Pascal VOC 2012 segmentation dataset to Google Colab? - image-segmentation

I am new in building data pipe-line. I want to import Pascal VOC dataset into Google Colab.
Can some please point to me a good Google Colab/Jupyter notebook file?

You can download a file from google drive using gdown,the pascal voc dataset is already uploaded on drive so you can use the following commends,
!gdown https://drive.google.com/uc?id=1-7H7hdaaLZnYidfi0faQdq7Dn0CFhDOA
!unzip cl-rg-objectsdetection-recognization.zip.com%2F20210327%2Fauto%2Fstorage%2Fgoog4_request\&X-Goog-Date\=20210327T174906Z\&X-Goog-Expires\=259199\&X-Goog-SignedHeaders\=hos

Related

pyLDAvis visualization from gensim not displaying the result in google colab

import pyLDAvis.gensim
# Visualize the topics
pyLDAvis.enable_notebook()
vis = pyLDAvis.gensim.prepare(lda_model, corpus, id2word)
vis
The above code displayed the visualization of LDA model in google colab but then after reopening the notebook it stopped displaying.
I even tried
pyLDAvis.display(vis, template_type='notebook')
still not working
When I set
pyLDAvis.enable_notebook(local=True)
it does display the result but not the labels.. Any help would be appreciated!!
when you install LDAvis make sure to specify the version to be 2.1.2 with:
!pip install pyLDAvis==2.1.2
the new versions don't seem to play well with colab.
they changed the package name. use it like:
import pyLDAvis.gensim_models
vis = pyLDAvis.gensim_models.prepare(lda_model, corpus, id2word)
vis

Using .traineddata with passportEye Python for MRZ

I am trying to improve accuracy of passport MRZ reading with tesseract ocr and passportEye I have found few github repositories containing "*.traineddata", it says to move it into tesseract ocr tessdata folder, I did that. No where in readme of these repos says how to use it, I believe it is something trivial, but I am very new to this tesseract thing.
How do I use it with passportEye in python, I am completely lost here. searched a lot. Here is the current code.
import os
from passporteye import read_mrz
pr_path = os.getcwd()
file_path = os.path.join(pr_path,'my_app', 'data')
mrz = read_mrz(file_path + '/test1.jpg')
print(mrz)
This is the .traineddata file I want to test for more accuracy : https://github.com/DoubangoTelecom/tesseractMRZ/blob/master/tessdata_best/mrz.traineddata
I do not want to use bulky openCV. Please help
From looking into the source code I would say you can`t, without changing the codebase of PassportEye:
Normally you would pass the language you are using via: -l paramerter to tesseract - in your case:
-l mrz
But the PassportEye implementation does not give you that option:
https://github.com/konstantint/PassportEye/blob/929c186c4dfa80a1ac975b5f2b95002ca12889d0/passporteye/util/ocr.py#L48
they pass lang=None, you would need to change that part to lang=mrz
pytesseract.run_tesseract(input_file_name,
output_file_name_base,
'txt',
lang='mrz',
config=config)

Plot3D Reader Sample Data

Is there a place to find sample CFD data in this format please?
I am trying to create scripts for Paraview to automate files opening and want to use the track function.
Data from this article can be used to test out the file format:

How does one read multiple DICOM and PNG files at once using pydicom.read_file() and cv2.imread()?

Currently working on a Fully CNN for renal segmentation in MR images. Have 40 images and their ground truth labels, attempting to load all of the images for pre-processing purposes.
Using Google Colab, with the latest versions of pydicom and pip installed, for this project. Currently have the Google Drive mounted to the Colab program and the code below shows the correct pathways to the images and their masks in the pydicom.read_file() and cv2.imread() calls, respectively.
However, when I use the "/../IMG*.dcm" or "/../IMG*.png" file paths (which should be legal?), I receive a "FileNotFoundError" as listed below. But, when I specify a specific .dcm or .png image, the pydicom.read_file() and cv2.imread() calls function quite normally.
Any suggestions on how to resolve this issue? I am struggling a lot with loading the data and pre-processing but have the model architecture ready to go once these preliminary hurdles are overcome.
#import data as data
import pydicom
import numpy as np
images= pydicom.read_file("/content/drive/My Drive/CHOAS_Kidney_Labels/Training_Images/T1DUAL/IMG*.dcm");
numpyArray = images.pixel_array
masks= cv2.imread("/content/drive/My Drive/CHOAS_Kidney_Labels/Ground_Truth_Training/T1DUAL/IMG*.png");
-----> FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/CHOAS_Kidney_Labels/Training_Images/T1DUAL/IMG*.dcm'
pydicom.read_file does not support wildcards. You have to iterate over the files yourself, something like (untested):
import glob
import pydicom
pixel_data = []
paths = glob.glob("/content/drive/My Drive/CHOAS_Kidney_Labels/Training_Images/T1DUAL/IMG*.dcm")
for path in paths:
dataset = pydicom.dcmread(path)
pixel_data.append(dataset.pixel_array)

Read EDI X12 File and convert Using Talend Open Studio

I am new to EDI. I got some information about EDI from Here. I heard that Talend supports reading of EDI X12 files using some technique called Smooks. I downloaded
Talend Open Studio for Data Integration v5.3.1. But I don't know how to Use it for reading EDI file
I got a EDI text from one site
ISA*00* *00* *12*3109992367T *ZZ*IAISNOKIST *070103*0839*^*00307*000024398*0*P*>~
GS*OG*3109992367*IAISNOKIST*20070103*0839*24398*T*004010UCS~
ST*875*000024479~
G50*N*20071230*59590001~
G62*10*20070106~
NTE*GEN*59590001~
NTE*GEN*IF ANY CHANGES OR SHORTAGES PLEASE~
NTE*GEN*CONTACT ALLY SMITH (310) 256-9388~
NTE*GEN*OR EMAIL ASMITH#AOL.COM~
G66*CC*H~
N1*BT*UNIFIED WESTERN GROCERS*9*0063333040005~
N3*PO BOX 11111 TERMINAL WAY~
N4*LOS ANGELES CA 900250000~
N1*ST*CGC MECHANIZED WAREHOUSE*9*0069333040180~
N3*1200 SHEILA AV~
N4*COMMERCE CA 900400000~
N1*BO*MY COMPANY NAME*9*193807245~
G68*10*CA*1.57*006121100201~
G69*SPRINGFIELD APPLESAUCE~
G70*1*5*OZ~
G68*10*CA*3.98*006121100202~
G69*SPRINGFIELD FANCY APPLESAUCE~
G70*1*5*OZ~
G76*100*CA~
SE*23*000024479~
GE*1*24398~
IEA*1*000024398~
I want to save this as a EDI file. What should be its extension? And is there any link for steps or demo for using Talend to read this file and parse it to some readable file like CSV or XML?
Thanks
Not many people seems to remark this, but Talend Open Studio for Data come with a default project called Demo project (see picture below).
This project contains tons of examples of how to use Talend. You should find everything there. Every job is very well documented, example :
Official online documentation can be found here : http://www.talendforge.org/tutorials/menu.php (free registration may be required at some point)