get Thumbnail image from wikimedia commons - thumbnails

I do have a filename from wikimedia commons and I want to access the thumbnail-image directly.
Example: Tour_Eiffel_Wikimedia_Commons.jpg
I found a way to get json-data containing the url to the thumbnail I want:
https://en.wikipedia.org/w/api.php?action=query&titles=Image:Tour_Eiffel_Wikimedia_Commons.jpg&prop=imageinfo&iiprop=url&iiurlwidth=200
but I don't want another request. Is there a way to access the thumbnail directly?

If you're okay to rely on the fact the current way of building the URL won't change in the future (which is not guaranteed), then you can do it.
The URL looks like this:
https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/Tour_Eiffel_Wikimedia_Commons.jpg/200px-Tour_Eiffel_Wikimedia_Commons.jpg
The first part is always the same: https://upload.wikimedia.org/wikipedia/commons/thumb
The second part is the first character of the MD5 hash of the file name. In this case, the MD5 hash of Tour_Eiffel_Wikimedia_Commons.jpg is a85d416ee427dfaee44b9248229a9cdd, so we get /a.
The third part is the first two characters of the MD5 hash from above: /a8.
The fourth part is the file name: /Tour_Eiffel_Wikimedia_Commons.jpg
The last part is the desired thumbnail width, and the file name again: /200px-Tour_Eiffel_Wikimedia_Commons.jpg

Solution in Python based on the solution of #svick:
import hashlib
def get_wc_thumb(image, width=300): # image = e.g. from Wikidata, width in pixels
image = image.replace(' ', '_') # need to replace spaces with underline
m = hashlib.md5()
m.update(image.encode('utf-8'))
d = m.hexdigest()
return "https://upload.wikimedia.org/wikipedia/commons/thumb/"+d[0]+'/'+d[0:2]+'/'+image+'/'+str(width)+'px-'+image

In case anyone is doing this query in SPARQL instead of Python:
There exists an MD5 function in SPARQL and the whole string manipulation can be implemented in SPARQL too!
BIND(REPLACE(wikibase:decodeUri(STR(?image)), "http://commons.wikimedia.org/wiki/Special:FilePath/", "") as ?fileName) .
BIND(REPLACE(?fileName, " ", "_") as ?safeFileName)
BIND(MD5(?safeFileName) as ?fileNameMD5) .
BIND(CONCAT("https://upload.wikimedia.org/wikipedia/commons/thumb/", SUBSTR(?fileNameMD5, 1, 1), "/", SUBSTR(?fileNameMD5, 1, 2), "/", ?safeFileName, "/650px-", ?safeFileName) as ?thumb)
Run this live query in Wikidata's query service: here, as discussed here: https://discourse-mediawiki.wmflabs.org/t/accessing-a-commons-thumbnail-via-wikidata/499

Related

XPATH - /a/text(), cant extract email address (text)

I have simple HTML file with usernames and links to their sub-pages:
someUserName#domain.com
someUserName
I use
xpath('.//a/text()').extract_first()
to extract user name in plain text.
I have a problem when user specifies username in form of email (see first example) - empty object in returned in such case.
Edit: I have just noticed html has changed recently and I haven't rechecked:
<td><span class="__cf_email__" data-cfemail="3f4d565c544c5e514bwer4rwre58525e5653115c5052">[email protected]</span></td>
I'll extract from #href.
I have used the following code:-
import scrapy
inputString = '''<xmlData>
someUserName#domain.com
someUserName
</xmlData>'''
print scrapy.selector.Selector(text=inputString).xpath('.//a/text()').extract_first()
Output:-
someUserName#domain.com
Can you paste full python code? Because, xpath code seems working fine as:-
scrapy.selector.Selector(text=inputString).xpath('.//a/text()').extract_first()
Getting the text node children of an element (using text()) is generally discouraged, for exactly the reasons demonstrated here. With <a>content</a> you will get "content", with <a><span>content</span><a> you will get nothing, with <a>h<sub>2</sub>o</a> you will get two text nodes, "h" and "o".
Use string() to get the string value instead. The string value contains the concatenated content of all the descendant text nodes at any depth. ("content", "content", and "h2o" in these three examples).
Only reservation is that I don't know the Scrapy API so I don't know how it handles XPath expressions that return strings rather than nodes.

Defaultdict() the correct choice?

EDIT: mistake fixed
The idea is to read text from a file, clean it, and pair consecutive words (not permuations):
file = f.read()
words = [word.strip(string.punctuation).lower() for word in file.split()]
pairs = [(words[i]+" " + words[i+1]).split() for i in range(len(words)-1)]
Then, for each pair, create a list of all the possible individual words that can follow that pair throughout the text. The dict will look like
[ConsecWordPair]:[listOfFollowers]
Thus, referencing the dictionary for a given pair will return all of the words that can follow that pair. E.g.
wordsThatFollow[('she', 'was')]
>> ['alone', 'happy', 'not']
My algorithm to achieve this involves a defaultdict(list)...
wordsThatFollow = defaultdict(list)
for i in range(len(words)-1):
try:
# pairs overlap, want second word of next pair
# wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
EDIT: wordsThatFollow[tuple(pairs[i])].update(pairs[i+1][1][0]
except Exception:
pass
I'm not so worried about the value error I have to circumvent with the 'try-except' (unless I should be). The problem is that the algorithm only successfully returns one of the followers:
wordsThatFollow[('she', 'was')]
>> ['not']
Sorry if this post is bad for the community I'm figuring things out as I go ^^
Your problem is that you are always overwriting the value, when you really want to extend it:
# Instead of this
wordsThatFollow[tuple(pairs[i])] = pairs[i+1][1]
# Do this
wordsThatFollow[tuple(pairs[i])].append(pairs[i+1][1])

Split location.href for multiple values?

I have a varient that finds the current url and splits it as follows:
var ehref = window.location.href.split('?',1);
This is then used to match the url with a navigation link href and give an ID to the page. My issue is that when our cookie pop up is closed, # is added to the url. Subsequently the page links are passed around between users with the # and the page ids do not work.
What is a simple way of splitting the url at a # as well? I am new to jquery, thus I understand the gist of what I'm 'reading,' but anything I've tried from researching the net has broken the page. I can replace the '?' With '#' but that doesn't really solve the issue.
Thanks!
If you want to get string after '#' you can write like this:
window.location.hash
in javascript ,see here
I have been searching for a way to split up a URL and replace with a new URL. as example, YouTube.com/ "user/video" and change it to YouTube.com/v/"video" so I would not have to sign in to watch a video that got restricted. But then needed to use the same code that would grab whatever string I want between two marks. So here we go!
Our goal: To isolate a part of a URL and use it within another URL!
The line of code will be broken up in sections for easy reading
The line of code will be for a web-link, clicked from the browser’s bookmark
Example URL
https: //duckduckgo.com/?q=School&t=h_&atb=v102-5_f&ia=web
The code:
javascript:var DDG=(window.location.href.split('?q=')[1]);DDG2=DDG.split('&t')[0];DD2G="https://www.google.com/search?q="+DDG2;window.location.assign(DD2G);
Variable name;
DDG = duckduckgo
DDG2 = duckduckgo2
DD2G = duckduckgo 2 google
The code break down:
javascript:var DDG=(window.location.href.split('?q=')[1]);
DDG2 = DDG.split('&t')[0];
DD2G="https://www.google.com/search?q="+DDG2;
window.location.assign(DD2G);
The first part of the code defines it as a JavaScript, we create a variable (var) with the name DDG
Var DDG
The next part we want the value to be what the current URL of the users browser and split that into sections
window.location.href.split
We want to find within the URL this string ‘?p=’ which indicates the search inquiry/s in duckduckgo
But I only want what comes after ‘?p=’ represented by [1], which will give our variable name DDG the value of this: School&t=h_&atb=v102-5_f&ia=web
We now want to split the new value we just gave to our DDG variable, so we do a split on that
DDG.split, and this time we only want everything before the ‘&=’ so we put [0] and assigned that result to a new variable we called DDG2
DDG2 = DDG.split(‘&t’)[0]
We now have a new variable with the value we wanted and we will use DDG2 to replace whatever we want in another URL!
DDG2 = School (this updates every time there is a new search.)
Now we want to replace the URL with our new URL + our variable name.
We make our final variable name DD2G with the value of: https:// www.google .com/search?q= but we want to add our value from DDG2
DD2G="https: //www.google.com/search?q="+DDG2;
Which would look like this (https: //www.google.com/search?q=School).
We now want to assign that to the browser and it will redirect to the new URL with the search term.
window.location.assign(DD2G);
= window.location.assign(“https: //www.google.com/search?q=” + (DDG2))
= window.location.assign(“https: //www.google.com/search?q=School”)
= https: //www.google.com/search?q=School //our new URL with our search term we started with from duckduckgo, without having to retype the inquiry.
So for your question, just replace the string between '' '?q=' with the first string you want the script to look for, then from that result, change the second string between'' '&t' with the second string you want it to look for.
I hope this helps!
if you want to test it out select all of this:
javascript:var DDG=(window.location.href.split('?q=')[1]);DDG2=DDG.split('&t')[0];DD2G="https://www.google.com/search?q="+DDG2;window.location.assign(DD2G);
and drag it to an empty space in your toolbar/bookmarks, in Firefox, I do not know if this works with other browsers, but if they support JavaScripts, it should work. Now navigate to DuckDuckgo.com and search for something, then click on that bookmarked with that code.

Use CSV values in JMeter as request path

I have one of jmeter User defined variable as a "comma separated value" - ${countries} = IN,US,CA,ALL .
(I was first trying to get it as a list/array - [IN,US,CA,ALL] )
I want to use the variable to test a web service - GET /${country}/info . IS it possible using ForEach controller or Loop controller ?
Only thing is that I want to save it or read it as IN,US,..,ALL and use it in the request path.
Thanks
The CSV should be as per the format mentioned in the image attached.
Refer to the link on how to use CSV in Jmeter: http://ivetetecedor.com/how-to-use-a-csv-file-with-jmeter/
Thread Group Settings
No. of threads: 1
Ramp-up period: 1
Loop Count: 4
Hope this will help.
CSV config is a red herring, you don't need it.
You can use a regular expression extractor to split up the variable into another variable (eg MyVar), using something like:
(.+?)[,\n]
This is trying to match each item before a , or newline. It will place the values in variables like MyVar_1, MyVar_2, etc. This is as close to an array as JMeter understands natively.
You can then loop on the contents of the matches using MyVar_matchNr, and MyVar_1 to MyVar_n (you will need to use __V() function to access the 'array' contents.

Split url code in c#

I want to split a URL using the following code:
string url="http://images/newyork/1550/t_2911340.JPG";
file_name=server.MapPath("~/storedImages/")+"t_2911340.gif";
save_file_from_url(file_name,url);
But I want my code like this:
file_name=server.MapPath("~/storedImages/") +
( values after last / from url and before ) +
gif // by adding gif i want to rename it
Can you help me to split the code and append it?
Thanks in advance.
See the System.Uri class. Construct an instance of System.Uri, passing your URL string to the constructor. Then access the various properties of the Uri object as your "split" URL. To further split the path portion of the URL into segments, use the Segments Property.