what is the method used for getting all the elements(substrings) in iterable.. as we get first substring with iterable.iterator().next(); - guava

This is my code:
a1 = Splitter.fixedLength(4).split("goodgirl");
System.out.println("a1=" + a1);
a3[1] = a1.iterator().next();
System.out.println("a3[1]=" + a3[1]);
When I use Splitter class from Guava library, my string i.e. "goodgirl" gets split into [good, girl] as fixed length is 4 and gets stored in a1.
Now with a3[1] = a1.iterator().next(); I can get the substring "good" from a1.
How can i get the next substring (i.e. "girl")?

EDIT
Or use Iterables.get(Iterator, 1);
Iterator<String> iterator = Splitter.fixedLength(4).split("goodgirl").iterator();
while(iterator.hasNext()){
System.out.println(iterator.next());
}
next() return the next token, you should use it with hasNext

Related

Mirth String Handling

I'm using the code below to try and strip the file extension off the incoming file and replace it with "ACK";
Can't use .lastIndexOf as it's not available in Rhino.
var _filename = String(sourceMap.get('originalFilename'));
pos = -1;
var search = ".";
for(var i = 0; i < _filename.length - search.length; i++) {
if (_filename.substr(i, search.length) == search) {
pos = i;
}
}
logger.info('_pos:' + _pos);
Every time I get a pos value of -1
i.e. Last full stop position not found.
BUT if I hardcode the filename in as "2020049.259317052.HC.P.F3M147-G" it works perfectly.
Is it something to do with the sourceMap.get('originalFilename') supplying a non-string or different
character set ?
This was tested on mirth 3.5. Rhino does, in fact, have String.prototype.lastIndexOf for all mirth versions going back to at least mirth 3.0. You were correctly converting the java string from the sourceMap to a javascript string, however, it is not necessary in this case.
Java strings share String.prototype methods as long as there is not a conflict in method name. Java strings themselves have a lastIndexOf method, so that is the one being called in my answer. The java string is able to then borrow the slice method from javascript seamlessly. The javascript method returns a javascript string.
If for some reason the filename starts with a . and doesn't contain any others, this won't leave you with a blank filename.
var filename = $('originalFilename');
var index = filename.lastIndexOf('.');
if (index > 0) filename = filename.slice(0, index);
logger.info('filename: ' + filename);
That being said, I'm not sure why your original code wasn't working. When I replaced the first line with
var originalFilename = new java.lang.String('2020049.259317052.HC.P.F3M147-G');
var _filename = String(originalFilename);
It gave me the correct pos value of 22.
New Answer
After reviewing and testing what agermano said he is correct.
In your sample code you are setting pos = i but logging _pos
New answer var newFilename = _filename.slice(0, _filename.lastIndexOf('.'))
Older Answer
First, you are mixing JavaScript types and Java types.
var _filename = String(sourceMap.get('originalFilename'));
Instead, do
var _filename = '' + sourceMap.get('originalFilename');
This will cause a type conversion from Java String to JS string.
Secondly, there is an easier way to do what you are trying to do.
var _filenameArr = ('' + sourceMap.get('originalFilename')).split('.');
_filenameArr.pop() // throw away last item
var _filename = _filenameArr.join('.') // rejoin the array with out the last item
logger.info('_filename:' + _filename)

cgi.parse_multipart function throws TypeError in Python 3

I'm trying to make an exercise from Udacity's Full Stack Foundations course. I have the do_POST method inside my subclass from BaseHTTPRequestHandler, basically I want to get a post value named message submitted with a multipart form, this is the code for the method:
def do_POST(self):
try:
if self.path.endswith("/Hello"):
self.send_response(200)
self.send_header('Content-type', 'text/html')
self.end_headers
ctype, pdict = cgi.parse_header(self.headers['content-type'])
if ctype == 'multipart/form-data':
fields = cgi.parse_multipart(self.rfile, pdict)
messagecontent = fields.get('message')
output = ""
output += "<html><body>"
output += "<h2>Ok, how about this?</h2>"
output += "<h1>{}</h1>".format(messagecontent)
output += "<form method='POST' enctype='multipart/form-data' action='/Hello'>"
output += "<h2>What would you like to say?</h2>"
output += "<input name='message' type='text'/><br/><input type='submit' value='Submit'/>"
output += "</form></body></html>"
self.wfile.write(output.encode('utf-8'))
print(output)
return
except:
self.send_error(404, "{}".format(sys.exc_info()[0]))
print(sys.exc_info() )
The problem is that the cgi.parse_multipart(self.rfile, pdict) is throwing an exception: TypeError: can't concat bytes to str, the implementation was provided in the videos for the course, but they're using Python 2.7 and I'm using python 3, I've looked for a solution all afternoon but I could not find anything useful, what would be the correct way to read data passed from a multipart form in python 3?
I've came across here to solve the same problem like you have.
I found a silly solution for that.
I just convert 'boundary' item in the dictionary from string to bytes with an encoding option.
ctype, pdict = cgi.parse_header(self.headers['content-type'])
pdict['boundary'] = bytes(pdict['boundary'], "utf-8")
if ctype == 'multipart/form-data':
fields = cgi.parse_multipart(self.rfile, pdict)
In my case, It seems work properly.
To change the tutor's code to work for Python 3 there are three error messages you'll have to combat:
If you get these error messages
c_type, p_dict = cgi.parse_header(self.headers.getheader('Content-Type'))
AttributeError: 'HTTPMessage' object has no attribute 'getheader'
or
boundary = pdict['boundary'].decode('ascii')
AttributeError: 'str' object has no attribute 'decode'
or
headers['Content-Length'] = pdict['CONTENT-LENGTH']
KeyError: 'CONTENT-LENGTH'
when running
c_type, p_dict = cgi.parse_header(self.headers.getheader('Content-Type'))
if c_type == 'multipart/form-data':
fields = cgi.parse_multipart(self.rfile, p_dict)
message_content = fields.get('message')
this applies to you.
Solution
First of all change the first line to accommodate Python 3:
- c_type, p_dict = cgi.parse_header(self.headers.getheader('Content-Type'))
+ c_type, p_dict = cgi.parse_header(self.headers.get('Content-Type'))
Secondly, to fix the error of 'str' object not having any attribute 'decode', it's because of the change of strings being turned into unicode strings as of Python 3, instead of being equivalent to byte strings as in Python 3, so add this line just under the above one:
p_dict['boundary'] = bytes(p_dict['boundary'], "utf-8")
Thirdly, to fix the error of not having 'CONTENT-LENGTH' in pdict just add these lines before the if statement:
content_len = int(self.headers.get('Content-length'))
p_dict['CONTENT-LENGTH'] = content_len
Full solution on my Github:
https://github.com/rSkogeby/web-server
I am doing the same course and was running into the same problem. Instead of getting it to work with cgi I am now using the parse library. This was shown in the same course just a few lessons earlier.
from urllib.parse import parse_qs
length = int(self.headers.get('Content-length', 0))
body = self.rfile.read(length).decode()
params = parse_qs(body)
messagecontent = params["message"][0]
And you have to get rid of the enctype='multipart/form-data' in your form.
In my case I used cgi.FieldStorage to extract file and name instead of cgi.parse_multipart
form = cgi.FieldStorage(
fp=self.rfile,
headers=self.headers,
environ={'REQUEST_METHOD':'POST',
'CONTENT_TYPE':self.headers['Content-Type'],
})
print('File', form['file'].file.read())
print('Name', form['name'].value)
Another hack solution is to edit the source of the cgi module.
At the very beginning of the parse_multipart (around the 226th line):
Change the usage of the boundary to str(boundary)
...
boundary = b""
if 'boundary' in pdict:
boundary = pdict['boundary']
if not valid_boundary(boundary):
raise ValueError('Invalid boundary in multipart form: %r'
% (boundary,))
nextpart = b"--" + str(boundary)
lastpart = b"--" + str(boundary) + b"--"
...

Get the first elements (take function) of a DStream

I look for a way to retrieve the first elements of a DStream created as:
val dstream = ssc.textFileStream(args(1)).map(x => x.split(",").map(_.toDouble))
Unfortunately, there is no take function (as on RDD) on a dstream //dstream.take(2) !!!
Could someone has any idea on how to do it ?! thanks
You can use transform method in the DStream object then take n elements of the input RDD and save it to a list, then filter the original RDD to be contained in this list. This will return a new DStream contains n elements.
val n = 10
val partOfResult = dstream.transform(rdd => {
val list = rdd.take(n)
rdd.filter(list.contains)
})
partOfResult.print
The previous suggested solution did not compile for me as the take() method returns an Array, which is not serializable thus Spark streaming will fail with a java.io.NotSerializableException.
A simple variation on the previous code that worked for me:
val n = 10
val partOfResult = dstream.transform(rdd => {
rdd.filter(rdd.take(n).toList.contains)
})
partOfResult.print
Sharing a java based solution that is working for me. The idea is to use a custom function, which can send the top row from a sorted RDD.
someData.transform(
rdd ->
{
JavaRDD<CryptoDto> result =
rdd.keyBy(Recommendations.volumeAsKey)
.sortByKey(new CryptoComparator()).values().zipWithIndex()
.map(row ->{
CryptoDto purchaseCrypto = new CryptoDto();
purchaseCrypto.setBuyIndicator(row._2 + 1L);
purchaseCrypto.setName(row._1.getName());
purchaseCrypto.setVolume(row._1.getVolume());
purchaseCrypto.setProfit(row._1.getProfit());
purchaseCrypto.setClose(row._1.getClose());
return purchaseCrypto;
}
).filter(Recommendations.selectTopinSortedRdd);
return result;
}).print();
The custom function selectTopinSortedRdd looks like below:
public static Function<CryptoDto, Boolean> selectTopInSortedRdd = new Function<CryptoDto, Boolean>() {
private static final long serialVersionUID = 1L;
#Override
public Boolean call(CryptoDto value) throws Exception {
if (value.getBuyIndicator() == 1L) {
System.out.println("Value of buyIndicator :" + value.getBuyIndicator());
return true;
}
else {
return false;
}
}
};
It basically compares all incoming elements, and returns true only for the first record from the sorted RDD.
This seems to be always an issue with DStreams as well as regular RDDs.
If you don't want (or can't) to use .take() (especially in DStreams) you can think outside the box here and just use reduce instead. That is a valid function for both DStreams as well as RDD's.
Think about it. If you use reduce like this (Python example):
.reduce( lambda x, y : x)
Then what happens is: For every 2 elements you pass in, always return only the first. So if you have a million elements in your RDD or DStream it will shrink it to one element in the end which is the very first one in your RDD or DStream.
Simple and clean.
However keep in mind that .reduce() does not take order into consideration. However you can easily overcome this with a custom function instead.
Example: Let's assume your data looks like this x = (1, [1,2,3]) and y = (2, [1,2]). A tuple x where the 2nd element is a list. If you are sorting by the longest list for example then your code could look like below maybe (adapt as needed):
def your_reduce(x,y):
if len(x[1]) > len(y[1]):
return x
else:
return y
yourNewRDD = yourOldRDD.reduce(your_reduce)
Accordingly you will get '(1, [1,2,3])' as that has the longer list. There you go!
This has caused me some headaches in the past until I finally tried this. Hopefully this helps.

How to get current position of iterator in ByteString?

I have an instance of ByteString. To read data from it I should use it's iterator() method.
I read some data and then I decide than I need to create a view (separate iterator of some chunk of data).
I can't use slice() of original iterator, because that would make it unusable, because docs says that:
After calling this method, one should discard the iterator it was called on, and use only the iterator that was returned. Using the old
iterator is undefined, subject to change, and may result in changes to
the new iterator as well.
So, it seems that I need to call slice() on ByteString. But slice() has from and until parameters and I don't know from. I need something like this:
ByteString originalByteString = ...; // <-- This is my input data
ByteIterator originalIterator = originalByteString .iterator();
...
read some data from originalIterator
...
int length = 100; // < -- Size of the view
int from = originalIterator.currentPosition(); // <-- I need this
int until = from + length;
ByteString viewOfOriginalByteString = originalByteString.slice(from, until);
ByteIterator iteratorForView = viewOfOriginalByteString.iterator(); // <-- This is my goal
Update:
Tried to do this with duplicate():
ByteIterator iteratorForView = originalIterator.duplicate()._2.take(length);
ByteIterator's from field is private, and none of the methods seems to simply return it. All I can suggest is to use originalIterator.duplicate to get a safe copy, or else to "cheat" by using reflection to read the from field, assuming reflection is available in your deployment environment.

What am I doing wrong with this Python class? AttributeError: 'NoneType' object has no attribute 'usernames'

Hey there I am trying to make my first class my code is as follows:
class Twitt:
def __init__(self):
self.usernames = []
self.names = []
self.tweet = []
self.imageurl = []
def twitter_lookup(self, coordinents, radius):
twitter = Twitter(auth=auth)
coordinents = coordinents + "," + radius
print coordinents
query = twitter.search.tweets(q="", geocode='33.520661,-86.80249,50mi', rpp=10)
print query
for result in query["statuses"]:
self.usernames.append(result["user"]["screen_name"])
self.names.append(result['user']["name"])
self.tweet.append(h.unescape(result["text"]))
self.imageurl.append(result['user']["profile_image_url_https"])
What I am trying to be able to do is then use my class like so:
test = Twitt()
hello = test.twitter_lookup("38.5815720,-121.4944000","1m")
print hello.usernames
This does not work and I keep getting: "AttributeError: 'NoneType' object has no attribute 'usernames'"
Maybe I just misunderstood the tutorial or am trying to use this wrong. Any help would be appreciated thanks.
I see the error is test.twitter_lookup("38.5815720,-121.4944000","1m") return nothing. If you want the usernames, you need to do
test = Twitt()
test.twitter_lookup("38.5815720,-121.4944000","1m")
test.usernames
Your function twitter_lookup is modifying the Twitt object in-place. You didn't make it return any kind of value, so when you call hello = test.twitter_lookup(), there's no return value to assign to hello, and it ends up as None. Try test.usernames instead.
Alternatively, have the twitter_lookup function put its results in some new object (perhaps a dictionary?) and return it. This is probably the more sensible solution.
Also, the function accepts a coordinents (it's 'coordinates') argument, but then throws it away and uses a hard-coded value instead.