Binary files in Gridfs-Mongo DB to be stored in local drive in python 2.7 - mongodb

I have stored some attachments into GridFS mongo using the Put command in Gridfs.
x12 = 'c:\test\' + str10
attachment.SaveAsFile(x12)
with open(x12, 'rb') as content_file:
content = content_file.read()
object_id = fs.put(strattach,filename=str10)
strattach is obtained as follows
attachment = A1.Item(1) processing email attachments using MAPI
strattach = str(attachment) converting to string.If i am not doing this i get a Typeerror: saying
can only write strings or file like objects
A1 is the attachments collection and attachment is the object obtained.
Now the Put was successful and i got the the object ID object_id which was store in Mongo db along with file name.
Now i need to build my Binary file again using the object_id and file name in Python 2.7.
to do this i read from gridfs using f2 = object_id.read() and tried to apply the write method on F2
which is failing. When i read the manual it said read in python 2.7 returns a string instance.
Could you please help me on how i can save that instance back as a binary file in python2.7.
Any alternate suggestions will also be helpful
Thanks

Related

Save variables as mat files on S3

I would like to save variables as mat files on s3. The example on the official site shows "tall table" only. Maybe I can use the "system" command overstep MATLAB but I am looking for a straight forward solution.
Any suggestion?
It does look like save does not support saving to remote filesystems.
You can, however, write matrices, cells, tables and timetables.
An example which uses writetable:
LastName = {'Smith';'Johnson';'Williams';'Jones';'Brown'};
Age = [38;43;38;40;49];
T = table(Age,LastName)
writetable(T,'s3://.../table.txt')
Note:
To write to a remote location, filename must contain the full path of
the file specified as a uniform resource locator (URL) of the form:
scheme_name://path_to_file/my_file.ext
To obtain the right URL of the bucket, you can navigate to the contents of the s3 bucket, select a file in there, choose Copy path and remove the name of the file (e.g table.txt).
The alternative is, as you mentioned, a system call:
a = rand(5);
save('matExample','a');
system('aws s3api put-object --bucket mybucket --key=s3mat.mat --body=matExample.mat')
the mat file matExample.mat is saved as s3.mat on the server.

Is there a way to copy quickly files from remote location to local in pyspark

I'm copying files from remote location using lftp using mget parameter. The task takes approximately 2 min to copy 50 xml files from a sftp machine to my local Unix machine. I'd like to be able to copy 20k files. An XML file is approx ~15kb. The dataframe df_files contains the list of all the XML files that I'd like to copy.
I've tried the code below with 20 thousand files, it seems to take few hours in order to create a dataframe with those files.
for row in df_files.tolist():
print row
cmd_p1 = """lftp sftp://username:password!#remotelocation-e "lcd /var/projects/u_admin/folder/;mget /var/projects/storage/folder/"""+row
cmd_p2 = """;bye " """
cmd_get_xml = cmd_p1+cmd_p2
s=subprocess.call(cmd_get_xml,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
j=0
for row in df_file.itertuples(index=True, name='Pandas'):
print getattr(row,'filename')
if j==0:
acq = sqlContext.read.format("com.databricks.spark.xml").option("rowTag","Message").load("file:///var/projects/u_admin/folder/"+df_file['filename'].iloc[j])
schema = acq.schema
else :
acq2 = sqlContext.read.format("com.databricks.spark.xml").option("rowTag","Message").load("file:///var/projects/u_admin/folder/"+df_file['filename'].iloc[j], schema = schema)
acq = acq.union(acq2)
I'd like to be able to copy those files for the least amount of time.
First, get all your .xml files into one directory using SCP module for Paramiko. Assuming that you .xml files have the same schema since you are able to do a union on the same, once you have all these xml files in one directory, you can directly read the entire directory rather than reading files individually.
This solution will save a lot of time that you spend in the for loop.
import paramiko
from scp import SCPClient
def createSSHClient(server, port, user, password):
client = paramiko.SSHClient()
client.load_system_host_keys()
client.set_missing_host_key_policy(paramiko.AutoAddPolicy())
client.connect(server, port, user, password)
return client
ssh = createSSHClient(server, port, user, password)
scp = SCPClient(ssh.get_transport())
Then call scp.get() or scp.put() to do SCP operations.
acq_all = sqlContext.read.format("com.databricks.spark.xml").option("rowTag","Message").load("file:///var/projects/u_admin/folder/", schema = schema)
I understand your use-case might be a little different, since you're also having an if-else block, but the schema is same, so it can be done once the files are read.
You can read one file, to get the right schema or you can define it by yourself before the read.

Additional PDF Attachment in E-Mail (ABAP)

I'm currently trying to send the results of a selection via E-Mail, more precisely as an attachment. My goal is to create a XML-File (which works so far) and a PDF, both fed from the internal table in which the selected data is held. The internal table is declared with a custom type. My current code for sending the E-Mail with the XML attachment looks like following:
lr_send_request = cl_bcs=>create_persistent( ).
lr_document = cl_document_bcs=>create_document( i_type = 'HTM'
i_text = lt_text
i_subject = lv_subject ).
* ----- converting data of internal table so it is suitable for XML
...
* -----
lr_document->add_attachment( i_attachment_type = 'BIN'
i_attachment_subject = 'output.xml'
i_attachment_size = xml_size
i_attachment_language = sy-langu
i_att_content_hex = xml_content ).
lr_send_request->set_document( lr_document ).
On the web I was only able to find how to convert spooljob (whatever that is :/) into PDF. With functions like that I may be able to solve my problem but then I can't attach the XML anymore.
How can I convert the data of the internal table into a PDF file to attach it to the E-Mail in the same way I do with the XML?
There are multiple way to create PDF:
Create report with Smartform and get output in PDF format. Sample code
If your system has adobe form license create with adobe form.
Use zcl_pdf class for creating native pdf file.
Using CONVERT_ABAPSPOOLJOB_2_PDF FM for getting printer spool as pdf (thanks #Sandra Rossi).
If your PDF is simple (not include complex table, vertical text, images, etc) use third option, otherwise try first or second.

Uploading binary file to API using python

I am trying to upload a package binary over to the RestAPI of storage using python. But it keeps throwing error and couldn't upload the file.
Below is the code I am using to achieve it :
jsonheaderup={'Content-Type': 'application/octet-stream'}
file = open('install.pkg.gz', 'rb')
files = {'file': file}
def upload_code():
u = requests.post("%s/api/sys/v2/updates" % (url), files=files, verify=False, headers=jsonheaderup)
l = json.loads(u.text)
upload_code()
The earlier posts didn't really helped but I figured it out by referring the original doc of requests: Streaming uploads. Check doc here.
As my file was huge around 1.9 GB, so session was breaking in between the upload process, hence giving error as "Internal error".
As its huge file I streamed and upload it by providing a file-like object in my function:
def upload_code():
jsonheaderup={'Content-Type': 'application/octet-stream'}
with open('/root/ak-nas-2013-06-05-7-18-1-1-3-nd.pkg.gz', 'rb') as file:
requests.post("%s/api/system/v1/updates" % (url), data=file, auth=zfsauth, verify=False, headers=jsonheaderup, timeout=None)
At first glance, I can not see any mistake.
Did you see this: Python 3 script to upload a file to a REST URL (multipart request) ?

Upload same name files to google cloud storage then download them with original names

So in google-cloud-storage if you upload more than one file with the same name to it the last will overwrite what was uploaded before it.
If I want to upload more than one file with the same name I should append some unique thing to the file name e.g. timestamp, random UUID.
But by doing so I'll lose the original file name while downloading, because I want to serve the file directly from google.
If we used the unique identifier as a folder instead of appending it to the file name, e.g. UUID +"/"+ fileName then we can download the file with its original name.
You could turn on Object Versioning which will keep the old versions of the object around.
Alternatively, you can set the Content Disposition header when uploading the object, which should preserve whatever filename you want on download.
instead of using object versioning, you can attach the UUID (or any other unique identifier) and then update the metadata of the object (specifically the content disposition), the following is a part of a python script i've used to remove the forward slashes - added by google cloud buckets when to represent directories - from multiple objects, it's based on this blog post, please keep in mind the double quotes around the content position "file name"
def update_blob_download_name(bucket_name):
""" update the download name of blobs and remove
the path.
:returns: None
:rtype: None
"""
# Storage client, not added to the code for brevity
client = initialize_google_storage_client()
bucket = client.bucket(bucket_name)
for blob in bucket.list_blobs():
if "/" in blob.name:
remove_path = blob.name[blob.name.rfind("/") + 1:] # rfind gives that last occurence of the char
ext = pathlib.Path(remove_path).suffix
remove_id = remove_path[:remove_path.rfind("_id_")]
new_name = remove_id + ext
blob.content_disposition = f'attachment; filename="{new_name}"'
blob.patch()