Map special characters to URL safe, readable versions [closed] - perl

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
i am looking for a mapping table or Perl module or anything else, which makes it possible to map characters to a URL safe version that is also readable.
I need to build URLs without any special characters. The base words are city names in their native language which means it can contain special characters from that language.
For example, when i have something like the polish city name 'łódź' i need to get a readable version like: 'lodz'

The major browsers show and accept non-ASCII characters in the URL bar even if they need to be encoded during transmission.
For example,
http://.../city/Montr%C3%A9al
will appear as
http://.../city/Montréal
in the browser's URL bar. [Test]
But if you want to convert to a subset of ASCII, you'd start by using Text::Unidecode's unidecode. Then you gotta decide what to do with the characters that must be escaped in URLs.

Related

What type of password hash [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
!MJXAy.... (41 characters, A-Z a-z 0-9)
I have a list of passwords from an older web app, but no longer have the source for creating / verifying the hash.
I think it may have been a django app, but cant be sure.
Im looking for either the type of hash , or a link to source so i can validate logins.
Any thoughts / suggestions?
You might simply have 40-character hashes that have been "disabled".
! and * are often placed at the beginning or end of a hash to reversibly "disable" accounts (because those characters never appear in most hashes, thereby by making any hash containing them impossible to match).
If the account ever needs to be reactivated, the obviously invalid "disabling" character can simply be removed, restoring the original password without having to know the original password or interact with the user.
If all of the hashes start with an exclamation point, perhaps all of the hashes were disabled on purpose for some reason.
And for what it's worth, the two Django formats that I'm aware of look like this (from
the Hashcat example-hashes page, both plaintexts 'hashcat'):
Django (PBKDF2-SHA256):
pbkdf2_sha256$20000$H0dPx8NeajVu$GiC4k5kqbbR9qWBlsRgDywNqC2vd9kqfk7zdorEnNas=
Django (SHA-1):
sha1$fe76b$02d5916550edf7fc8c886f044887f4b1abf9b013
Both of these are salted and don't seem to match your character set. It's possible that your hash is a custom one, with the final step being a base64 conversion (but without the trailing =?).
The closest I can see are an ASCII MD5 hash converted to base64 and a binary SHA26 hash converted to base64, but both of those are 44 characters.
Your best bet may be to trim off the ! and try to verify them with mdxfind, which will try many different kinds and chains of hashing, encoding, and truncation.

How does compiler understand Unicode characters so quickly? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have made a document based program lately.
But what intrigues me that how can a compiler(in my case, objective-c) convert any character into Unicode so fast while these characters are only visual presentations.
I think maybe A~Z and all other common characters can be converted from ASCII to Unicode very easily. What about other special character such as brand icon and copyright icon?
I am solely interested in the internal working of such conversion.
Example:
How do compiler understand what "©" is in a blink of second? Is it by looking up a UNICODE table? But if I have 1000000 "©", does my compiler look them up in the table 1000000 times? That is very time consuming, isn't it?
The compiler doesn't see "©". It sees whatever numerical representation of "©" occurs in the source file it's processing. No lookup is needed, because it's already in the form the compiler uses. (Some conversions might be needed if, for example, the source file is in UTF-8 and the compiler uses UTF-32 internally, but such conversions don't require a full Unicode table.)

How to convert character to unicode? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have this character.
&#8211
How to convert this character to unicode?
Sorry if it is a silly question.
It's not a silly question, character encoding can be tricky to get your head around. I highly recommend reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) (I'm sure you can guess the topic).
Unicode itself isn't an encoding, it's a very long list of characters and code points. What I'm guessing you want to do is display the dash character in some way. Where are you wanting to display or store the data? If it's in a browser, then that representation should work as that's the HTML encoded version. If you want to store it in a database then you'll need to convert that encoded version to a string and then convert that string to whatever encoding the database is using.
Take a look at this source has the encoding in different formats
http://www.fileformat.info/info/unicode/char/2013/index.htm
but each language has its own rules on how to write this in a string/char literal

Looking for special characters in Google [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
Do you know how to look for special characters with google...?
I'm looking at bash code and there's the ## operator. I would like to know what It does but I wasn't able to figure out a way to protect the character (I'm not sure it's even possible).
This is particularly annoying when you're looking for some code patterns, some characters are always ignored.
Update: this answer is no longer applicable as of 2017. See https://blog.google/products/search/improvements-searching-special-characters-programming-languages/
Google strips most punctuation from queries, as described here, so it won't help you with the bash syntax.
It's very easy to search for the string "##" in the bash documentation: Just run "info bash", hit "s", and enter "##" as the search string.
google strips puntuation, imho, because:
it's somewhere used for special search (chars like - to exclude, +to add and 10..20 to specify a range)
to avoid spammers to get email addresses (characters like # or .)
in my experience, it's even impossible to escape special characters.
the only solution I found, by now, is using yahoo http://it.search.yahoo.com/

Unicode "end of story" [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 13 years ago.
Improve this question
I'm looking for a good character that means "end-of-story" in unicode. I remember seeing one once that looked like a fractal and was really cool. Does anyone know where I can find this character? More importantly, where can I go to find a unicode character with a special meaning when I don't know it's names? Google wasn't very helpful.
Edit: I found something that looks kinda like a fractal, and also happens to be called "end-of-story." It's a Thai character.
Is this what you were looking for?
http://www.decodeunicode.org/en/u+0e5b/data/k//XS/khomut31910809.jpg
End of story The Khomut sign is a terminal punctuation character which is placed in old books at the end of a verse in a poem, the end of a chapter or at the end of a story.
Compare to U+17DA Khmer Sign Koomuut
Btw: I found this with a Google Image Search on "end of story" unicode--It was the 4th result. That's probably the best way to search for any kind of symbol. Though without the name of the character it would probably have been impossible to find, since unicode fractal didn't return anything useful.
Go and have a look at the unicode.org code charts. You can browse through them and find a character that you like by what they look like. http://www.unicode.org/charts/
Alternatively, browse through the names of the characters using the data file that has the official character name. Do a search using your browser or editor search function. http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt
When you find a character that you want to see what it looks like, just do a search for the character code. e.g. character 0087 (the first field in the UnicodeData.txt file) is searched as U+0087. FileFormat.info usually has all of the characters. For example, END OF SELECTED AREA.
Are you using Windows? Use the Character Map (Start | Accessories | System Tools). I personally like the Greek Omega (U+03A9) or the Ohm sign which is an Omega (U+2126).