Python 3.7.3 variables with numbers in while loop conditionals - python-3.7

So I'm trying to learn some python3 with some simple code as stated below. The point of the code is to have a loop that runs until one of the files exists and contains data. For some reason i'm getting an error running this, saying that the variable has an invalid syntax just as if numbers in variables are illegal (which they arent?):
$ python3 test.py
File "test.py", line 14
While file1==False and file2==False and file3==False:
^
SyntaxError: invalid syntax
Code:
import os
filePath1 = '/some/path'
filePath2 = '/some/path'
filePath3 = '/some/path'
file1 = False
file2 = False
file3 = False
While file1==False and file2==False and file3==False:
if os.path.exists(filePath1):
with open(filePath1,'r') as f:
try:
file1 = f.read()
except:
print("No file data.")
if os.path.exists(filePath2):
with open(filePath2,'r') as f:
try:
file2 = f.read()
except:
print("No file data.")
if os.path.exists(filePath3):
with open(filePath3,'r') as f:
try:
file3 = f.read()
except:
print("No file data.")
I don't understand this because:
>>>file1=False
>>>file2=False
>>>file1==False and file2==False
True
I'd be grateful for any help

invalid syntax is caused by your capitalized While keyword, which python doesn't recognize. Use the reserved keyword while, small letters only.

Related

Importing two text files to compare as lists sequentially

Student trying to compare two .txt files of string "answers" from a multiple choice test a,c,d,b, etc. I've found some information on different parts of the problems I'm having and found a possible way to get the comparisons I want, but the guide was meant for in script strings and not pulling a list from a file.
For the import of the two files and comparing them, I'm basing my code on my textbook and this video here: Video example
I've got the code up and running, but for some reason I'm only getting 0.0% match when I want to a 100.0% match, at least for the two text files I'm using with identical answer lists.
import difflib
answer_sheet = "TestAnswerList.txt"
student_sheet = "StudentAnswerList.txt"
ans_list = open(answer_sheet).readlines()
stu_list = open(student_sheet).readlines()
sequence = difflib.SequenceMatcher(isjunk=None, a=ans_list, b=stu_list)
check_list = sequence.ratio()*100
check_list = round(check_list,1)
print(str(check_list) + "% match")
if check_list == 100:
print('This grade is Plus Ultra!')
elif check_list >= 75:
print('Good job, you pass!')
else:
print('Please study harder for your next test.')
# not the crux of my issue, but will accept advice all the same
answer_sheet.close
student_sheet.close
If I add in the close statement at the end for both of the text files, then I receive this error:
Traceback (most recent call last): File
"c:/Users/jaret/Documents/Ashford U/CPT 200/Python Code/Wk 5 Int Assg
- Tester code.py", line 18, in
answer_sheet.close AttributeError: 'str' object has no attribute 'close'
I had to re-look at how my files were being opened and realized that the syntax was for Python 2 not 3. I chose to go w/ basic open and later close to reduce any potential errors on my novice part.
import difflib
f1 = open('TestAnswerList.txt')
tst_ans = f1.readlines()
f2 = open('StudentAnswerList.txt')
stu_ans = f2.readlines()
sequence = difflib.SequenceMatcher(isjunk=None, a=stu_ans, b=tst_ans)
check_list = sequence.ratio()*100
check_list = round(check_list,1)
print(str(check_list) + "% match") # Percentage correct
if check_list == 100:
print('This grade is Plus Ultra!')
elif check_list >= 75:
print('Good job, you pass!')
else:
print('Please study harder for your next test.')
# Visual Answer match-up
print('Test Answers: ', tst_ans)
print('Student Answers:', stu_ans)
f1.close()
f2.close()

.extractText() returns "invalid literal for decimal"

I'm coding something which will read PDFs online and return a set of keywords that are found in the document. However I keep running into a problem with the extractText() function from the PyPDF2 package.
Here's my code to open the PDFs and read it:
x = myurl.pdf
if ".pdf" in x:
remoteFile = urlopen(Request(x, headers={"User-Agent": "Magic-Browser"})).read()
memoryFile = StringIO(remoteFile)
pdfFile = PyPDF2.PdfFileReader(memoryFile, strict=False)
num_pages = pdfFile.numPages
count = 0
text = ""
while count < num_pages:
pageObj = pdfFile.getPage(count)
count += 1
text += pageObj.extractText()
The error that I keep running into on the extractText() line goes like this:
Traceback (most recent call last):
File "errortest.py", line 30, in <module>
text += pageObj.extractText()
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2595, in extractText
content = ContentStream(content, self.pdf)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2674, in __init__
self.__parseContentStream(stream)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/pdf.py", line 2706, in __parseContentStream
operands.append(readObject(stream, None))
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 98, in readObject
return NumberObject.readFromStream(stream)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 271, in readFromStream
return FloatObject(num)
File "/anaconda2/lib/python2.7/site-packages/PyPDF2/generic.py", line 231, in __new__
return decimal.Decimal.__new__(cls, str(value))
File "/anaconda2/lib/python2.7/decimal.py", line 547, in __new__
"Invalid literal for Decimal: %r" % value)
File "/anaconda2/lib/python2.7/decimal.py", line 3872, in _raise_error
raise error(explanation)
decimal.InvalidOperation: Invalid literal for Decimal: '99.-72'
Would be great if someone could help me out! Thanks!
There is too little information to be certain, but PyPDF2 (and now pypdf) improved a lot in 2022. You will probably just need to upgrade to the latest version of pypdf.
If you encounter a bug in pypdf again, please open an issue: https://github.com/py-pdf/pypdf
A good bug ticket contains (1) your pypdf version (2) the code + PDF document that caused the issue.

Unable to parse email (.msg) in python 3.6

I have set of .msg files stored in E:/ drive that I have to read and extract some information from it. For that i am using the below code in Python 3.6
from email.parser import Parser
p = Parser()
headers = p.parse(open('E:/Ratan/msg_files/Test1.msg', encoding='Latin-1'))
print('To: %s' % headers['To'])
print('From: %s' % headers['From'])
print('Subject: %s' % headers['subject'])
In the output I am getting as below.
To: None
From: None
Subject: None
I am not getting the actual values in To, FROM and subject fields.
Any thoughts why it is not printing the actual values?
Please download my sample msg file from this link:
drive.google.com/file/d/1pwWWG3BgsMKwRr0WmP8GqzG3WX4GmEy6/vi‌​ew
Here is a demonstration of how to use some of python's standard email libraries.
You didn't show us your input file in the question, and the g-drive URL is a deadlink.
The code below looks just like yours and works fine, so I don't know what is odd about your environment, modulo some Windows 'rb' binary open nonsense, CRLFs, or the Latin1 encoding.
I threw in .upper() but it does nothing beyond showing that the API is case insensitive.
#! /usr/bin/env python3
from email.parser import Parser
from pathlib import Path
import mailbox
def extract_messages(maildir, mbox_file, k=2, verbose=False):
for n, message in enumerate(mailbox.mbox(mbox_file)):
with open(maildir / f'{n}.txt', 'w') as fout:
fout.write(str(message))
hdrs = 'From Date Subject In-Reply-To References Message-ID'.split()
p = Parser()
for i in range(min(k, n)):
with open(maildir / f'{i}.txt') as fin:
msg = p.parse(fin)
print([len(msg[hdr.upper()] or '')
for hdr in hdrs])
for k, v in msg.items():
print(k, v)
print('')
if verbose:
print(msg.get_payload())
if __name__ == '__main__':
# from https://mail.python.org/pipermail/python-dev/
maildir = Path('/tmp/py-dev/')
extract_messages(maildir, maildir / '2018-January.txt')

UnicodeDecodeError: 'ascii' codec can't decode, with gensim, python3.5

I am using python 3.5 on both windows and Linux but get the same error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 0: ordinal not in range(128)
The error log is the following:
Reloaded modules: lazylinker_ext
Traceback (most recent call last):
File "<ipython-input-2-d60a2349532e>", line 1, in <module>
runfile('C:/Users/YZC/Google Drive/sunday/data/RA/data_20100101_20150622/w2v_coherence.py', wdir='C:/Users/YZC/Google Drive/sunday/data/RA/data_20100101_20150622')
File "C:\Users\YZC\Anaconda3\lib\site- packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)
File "C:\Users\YZC\Anaconda3\lib\site- packages\spyderlib\widgets\externalshell\sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Users/YZC/Google Drive/sunday/data/RA/data_20100101_20150622/w2v_coherence.py", line 70, in <module>
model = gensim.models.Word2Vec.load('model_all_no_lemma')
File "C:\Users\YZC\Anaconda3\lib\site-packages\gensim\models\word2vec.py", line 1485, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "C:\Users\YZC\Anaconda3\lib\site-packages\gensim\utils.py", line 248, in load
obj = unpickle(fname)
File "C:\Users\YZC\Anaconda3\lib\site-packages\gensim\utils.py", line 912, in unpickle
return _pickle.loads(f.read())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc1 in position 0: ordinal not in range(128)
1.I checked and found the default decode method is utf-8 by:
import sys
sys.getdefaultencoding()
Out[2]: 'utf-8'
when read the file, I also added .decode('utf-8')
I did add shepang line in the beginning and declare utf-8
so I really dont know why python couldnt read the file. Can anybody help me out?
Here are the code:
# -*- coding: utf-8 -*-
import gensim
import csv
import numpy as np
import math
import string
from nltk.corpus import stopwords, wordnet
from nltk.stem import WordNetLemmatizer
from textblob import TextBlob, Word
class SpeechParser(object):
def __init__(self, filename):
self.filename = filename
self.lemmatize = WordNetLemmatizer().lemmatize
self.cached_stopwords = stopwords.words('english')
def __iter__(self):
with open(self.filename, 'rb', encoding='utf-8') as csvfile:
file_reader = csv.reader(csvfile, delimiter=',', quotechar='|', )
headers = file_reader.next()
for row in file_reader:
parsed_row = self.parse_speech(row[-2])
yield parsed_row
def parse_speech(self, row):
speech_words = row.replace('\r\n', ' ').strip().lower().translate(None, string.punctuation).decode('utf-8', 'ignore')
return speech_words.split()
# -- source: https://github.com/prateekpg2455/U.S-Presidential- Speeches/blob/master/speech.py --
def pos(self, tag):
if tag.startswith('J'):
return wordnet.ADJ
elif tag.startswith('V'):
return wordnet.VERB
elif tag.startswith('N'):
return wordnet.NOUN
elif tag.startswith('R'):
return wordnet.ADV
else:
return ''
if __name__ == '__main__':
# instantiate object
sentences = SpeechParser("sample.csv")
# load an existing model
model = gensim.models.Word2Vec.load('model_all_no_lemma')
print('\n-----------------------------------------------------------')
print('MODEL:\t{0}'.format(model))
vocab = model.vocab
# print log-probability of first 10 sentences
row_count = 0
print('\n------------- Scores for first 10 documents: -------------')
for doc in sentences:
print(sum(model.score(doc))/len(doc))
row_count += 1
if row_count > 10:
break
print('\n-----------------------------------------------------------')
It looks like a bug in Gensim when you try to use a Python 2 pickle file that has non-ASCII chars in it with Python 3.
The unpickle is happening when you call:
model = gensim.models.Word2Vec.load('model_all_no_lemma')
In Python 3, during the unpickle it wants to convert legacy byte strings to (Unicode) strings. The default action is to decode with 'ASCII' in strict mode.
The fix will be dependant on the encoding in your original pickle file and will require you to patch the gensim code.
I'm not familiar with gensim so you will have to try the following two options:
Force UTF-8
Chances are, your non-ASCII data is in UTF-8 format.
Edit C:\Users\YZC\Anaconda3\lib\site-packages\gensim\utils.py
Goto line 912
Change line to read:
return _pickle.loads(f.read(), encoding='utf-8')
Byte mode
Gensim in Python3 may happily work with byte strings:
Edit C:\Users\YZC\Anaconda3\lib\site-packages\gensim\utils.py
Goto line 912
Change line to read:
return _pickle.loads(f.read(), encoding='bytes')

NSLocalizedString managing translations over app versions [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Here is a scenario:
I write an iPhone app using NSLocalizedString incase I decide to release it in different countries.
I decide to release the App over in France.
The translator takes my Localized.strings and does a great job translating
I update the app, and need some more translating.
I'm using genstrings and it overwrites the good work the translator did, is there a easy way for me to manage my translations over App versions?
Check out this project on GitHub, which provides a python scripts which makes genstrings a little bit smarter.
Since I don't like link-only answers (links may die), I'll also drop here the python script (all credits go to the author of the linked project)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Localize.py - Incremental localization on XCode projects
# João Moreno 2009
# http://joaomoreno.com/
# Modified by Steve Streeting 2010 http://www.stevestreeting.com
# Changes
# - Use .strings files encoded as UTF-8
# This is useful because Mercurial and Git treat UTF-16 as binary and can't
# diff/merge them. For use on iPhone you can run an iconv script during build to
# convert back to UTF-16 (Mac OS X will happily use UTF-8 .strings files).
# - Clean up .old and .new files once we're done
# Modified by Pierre Dulac 2012 http://friendcashapp.com
# Changes
# - use logging instead of print
# Adds
# - MIT Licence
# - the first parameter in the command line to specify the path of *.lproj directories
# - an optional paramter to control the debug level (set to info by default)
# Fixes
# - do not convert a file if it is already in utf-8
# - allow multiline translations generated by genstrings by modifing the re_translation regex
# -
# MIT Licence
#
# Copyright (C) 2012 Pierre Dulac
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or substantial
# portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT
# LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
# IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
from sys import argv
from codecs import open
from re import compile
from copy import copy
import os
import shutil
import optparse
import logging
logging.getLogger().level = logging.INFO
__version__ = "0.1"
__license__ = "MIT"
USAGE = "%prog [options] <url>"
VERSION = "%prog v" + __version__
re_translation = compile(r'^"((?:[^"]|\\")+)" = "((?:[^"]|\\")+)";(?:\n)?$')
re_comment_single = compile(r'^/\*.*\*/$')
re_comment_start = compile(r'^/\*.*$')
re_comment_end = compile(r'^.*\*/$')
class LocalizedString():
def __init__(self, comments, translation):
self.comments, self.translation = comments, translation
self.key, self.value = re_translation.match(self.translation).groups()
def __unicode__(self):
return u'%s%s\n' % (u''.join(self.comments), self.translation)
class LocalizedFile():
def __init__(self, fname=None, auto_read=False):
self.fname = fname
self.reset()
if auto_read:
self.read_from_file(fname)
def reset(self):
self.strings = []
self.strings_d = {}
def read_from_file(self, fname=None):
self.reset()
fname = self.fname if fname == None else fname
try:
#f = open(fname, encoding='utf_8', mode='r')
f = open(fname, encoding='utf_8', mode='r')
except:
print 'File %s does not exist.' % fname
exit(-1)
try:
line = f.readline()
logging.debug(line)
except:
logging.error("Can't read line for file: %s" % fname)
raise
i = 1
while line:
comments = [line]
if not re_comment_single.match(line):
while line and not re_comment_end.match(line):
line = f.readline()
comments.append(line)
line = f.readline()
i += 1
# handle multi lines
while len(line) > 1 and line[-2] != u';':
line += f.readline()
i += 1
logging.debug("%d %s" % (i, line.rstrip('\n')))
if line and re_translation.match(line):
translation = line
else:
logging.error("Line %d of file '%s' raising the exception: %s" % (i, self.fname, line))
raise Exception('invalid file')
line = f.readline()
i += 1
while line and line == u'\n':
line = f.readline()
i += 1
string = LocalizedString(comments, translation)
self.strings.append(string)
self.strings_d[string.key] = string
f.close()
def save_to_file(self, fname=None):
fname = self.fname if fname == None else fname
try:
f = open(fname, encoding='utf_8', mode='w')
except:
print 'Couldn\'t open file %s.' % fname
exit(-1)
# sort by key
self.strings.sort(key=lambda item: item.key)
for string in self.strings:
f.write(string.__unicode__())
f.close()
def merge_with(self, new):
merged = LocalizedFile()
for string in new.strings:
if self.strings_d.has_key(string.key):
new_string = copy(self.strings_d[string.key])
new_string.comments = string.comments
string = new_string
merged.strings.append(string)
merged.strings_d[string.key] = string
return merged
def update_with(self, new):
for string in new.strings:
if not self.strings_d.has_key(string.key):
self.strings.append(string)
self.strings_d[string.key] = string
def merge(merged_fname, old_fname, new_fname):
try:
old = LocalizedFile(old_fname, auto_read=True)
new = LocalizedFile(new_fname, auto_read=True)
merged = old.merge_with(new)
merged.save_to_file(merged_fname)
except Exception, inst:
logging.error('Error: input files have invalid format.')
raise
STRINGS_FILE = 'Localizable.strings'
def localize(path, excluded_paths):
languages = [os.path.join(path,name) for name in os.listdir(path) if name.endswith('.lproj') and os.path.isdir(os.path.join(path,name))]
print "languages found", languages
for language in languages:
original = merged = language + os.path.sep + STRINGS_FILE
old = original + '.old'
new = original + '.new'
if os.path.isfile(original):
try:
open(original, encoding='utf_8', mode='r').read()
os.rename(original, old)
except:
os.system('iconv -f UTF-16 -t UTF-8 "%s" > "%s"' % (original, old))
# gen
os.system('find %s -name \*.m -not -path "%s" | xargs genstrings -q -o "%s"' % (path, excluded_paths, language))
try:
open(original, encoding='utf_8', mode='r').read()
shutil.copy(original, new)
except:
os.system('iconv -f UTF-16 -t UTF-8 "%s" > "%s"' % (original, new))
# merge
merge(merged, old, new)
logging.info("Job done for language: %s" % language)
else:
os.system('genstrings -q -o "%s" `find %s -name "*.m" -not -path "%s"`' % (language, path, excluded_paths))
os.rename(original, old)
try:
open(old, encoding='utf_8', mode='r').read()
except:
os.system('iconv -f UTF-16 -t UTF-8 "%s" > "%s"' % (old, original))
if os.path.isfile(old):
os.remove(old)
if os.path.isfile(new):
os.remove(new)
def parse_options():
"""parse_options() -> opts, args
Parse any command-line options given returning both
the parsed options and arguments.
"""
parser = optparse.OptionParser(usage=USAGE, version=VERSION)
parser.add_option("-d", "--debug",
action="store_true", default=False, dest="debug",
help="Set to DEBUG the logging level (default to INFO)")
parser.add_option("-p", "--path",
action="store", type="str", default=os.getcwd(), dest="path",
help="Path (relative or absolute) to use for searching for *.lproj directories")
parser.add_option("-e", "--exclude",
action="store", type="str", default=None, dest="excluded_paths",
help="Regex for paths to exclude ex. ``./Folder1/*``")
opts, args = parser.parse_args()
return opts, args
if __name__ == '__main__':
opts, args = parse_options()
if opts.debug:
logging.getLogger().level = logging.DEBUG
if opts.path:
opts.path = os.path.realpath(opts.path)
if opts.excluded_paths:
opts.excluded_paths = os.path.realpath(opts.excluded_paths)
logging.info("Running the script on path %s" % opts.path)
localize(opts.path, opts.excluded_paths)
I use:
http://www.loc-suite.com
To only translate the new parts
I was having a similar issue. I changed a lot of keys for my NSLocalizedString-macros and was frightened that I'd ship the App with missing translations (didn't want to run through the whole App manually and check if everything's there either...).
I tried out the github project that Gabriella Petronella posted but I wasn't really that happy with it, so I wrote my own python module to accomplish what I wanted to do.
(I'm not gonna post the code here, since it's a whole module and not only one script :D)
Here is the couple of options you can chose to go with:
You can use some hand-written solution like the script mentioned above which will not completely rewrite the old files while adding a recently translated strings to them.
You can also create an additional strings.h file which will contain all the strings you do have so you will not need to rewrite them all the time, just in one place. So genstrings is not necessary anymore. However there is a con of using this: the string.h file will be unstructured which is probably not convenient for the big projects.
Thanks to Best practice using NSLocalizedString
// In strings.h
#define YOUR_STRING_KEY NSLocalizedString(#"Cancel", nil)
// Somewhere else in you code
NSLog(#"%#", YOUR_STRING_KEY);
I actually started using a tool called PhraseApp https://phraseapp.com/projects
It's worth looking into if you have to localise an app!