What exactly does 'Type Body Length' mean in Swiftlint? - swift

We just added Swiftlint to our project and we want to follow all the rules but I'm not sure what's meant by 'type_body_length' warning. I'm not a native english speaker so I find it a bit confusing.
There is a rule for file length aswell so how do they differ? What falls under this definition?

type_body_length violation means that the class has too many lines in it. I dont think it counts extensions, comments or whitespace
Type name should only contain alphanumeric characters, start with an uppercase character and span between 3 and 40 characters in length.
The rules documentation linked here and above also gives examples of what would and wouldn't be accepted (Triggering & Non Triggering). - Edit suggested by #GoodSp33d, thanks

Related

Multiple regex in one command

Disclaimer: I have no engineering background whatsoever - please don't hold it against me ;)
What I'm trying to do:
Scan a bunch of text strings and find the ones that
are more than one word
contain title case (at least one capitalized word after the first one)
but exclude specific proper nouns that don't get checked for title case
and disregard any parameters in curly brackets
Example: Today, a Man walked his dogs named {FIDO} and {Fifi} down the Street.
Expectation: Flag the string for title capitalization because of Man and Street, not because of Today, {FIDO} or {Fifi}
Example: Don't post that video on TikTok.
Expectation: No flag because TikTok is a proper noun
I have bits and pieces, none of them error-free from what https://www.regextester.com/ keeps telling me so I'm really hoping for help from this community.
What I've tried (in piece meal but not all together):
(?=([A-Z][a-z]+\s+[A-Z][a-z]+))
^(?!(WordA|WordB)$)
^((?!{*}))
I think your problem is not really solvable solely with regex...
My recommendation would be splitting the input via [\s\W]+ (e.g. with python's re.split, if you really need strings with more than one word, you can check the length of the result), filtering each resulting word if the first character is uppercase (e.g with python's string.isupper) and finally filtering against a dictionary.
[\s\W]+ matches all whitespace and non-word characters, yielding words...
The reasoning behind this different approach: compiling all "proper nouns" in a regex is kinda impossible, using "isupper" also works with non-latin letters (e.g. when your strings are unicode, [A-Z] won't be sufficient to detect uppercase). Filtering utilizing a dictionary is a way more forward approach and much easier to maintain (I would recommend using set or other data type suited for fast lookups.
Maybe if you can define your use case more clearer we can work out a pure regex solution...

What are allowed characters for a submodule name in registerModule?

registerModule() expects a submodule key as a third parameter.
I think it should probably not contain a space and only alphabetic characters (or alphanumeric?) and underscore ('_'), but I'm not really sure.
I could not find specific information for this.
The function makes use of \TYPO3\CMS\Core\Utility\GeneralUtility::underscoredToUpperCamelCase to generate the full module name combined of main module and sub module connected with an _
So you already guessed the right answer.
It's a bit complicated strange to answer!
Official API document does not provide exact information. I have worked around some extension which has multiple sub-module. I'm quite sure this not allow special character as you sub-module key.
eg. web_TestTestbe123 (mainModulename_subModuleKey)
I have noticed bellow characteristic for the key:
Key must be lowercase
No space allowed
Numerica value would be fine
Does this make sense?
I found this in the documentation just now:
Backend modules
1. The modkey is made up of alphanumeric characters only. It does not contain underscores and starts with a letter.
https://docs.typo3.org/m/typo3/reference-coreapi/master/en-us/ExtensionArchitecture/NamingConventions/Index.html

Any way to extend org-mode tags to include characters like "-"?

BRIEF: can the characters permitted in org-mode tags be extenced in any way?
E.g. to include -, dashes?
DETAIL:
I see
http://orgmode.org/org.html#Tags
Tags are normal words containing letters, numbers, ‘_’, and ‘#’. Tags
must be preceded and followed by a single colon, e.g., ‘:work:’.
I am somewhat surprised that this is not extensible. Is it, and I have missed it?
TODO keywords can include dashes. Occasionally I would like to treat TODOs as intermiscible wth tags - e.g. move a TODO to a tag, and vice versa - but this syntax difference gets in the way.
Before I start coding, does anyone know why dashes are not allowed? I conjecture confusion with timestamps.
Not certain, but it is likely because certain agenda search strings use "-" as an operator, and there is not yet a way to escape those using "-".
I have used hyphens for properties and spaces for todo kw with no problems for my purposes, but I have not tried them in searches.
The new exporter will be included in 8.0 and with it a detailed description of syntax. If you want to file a feature request to allow hyphens, perhaps by allowing escaping in search strings, now is the time to do it.

What Unicode characters are dangerous?

What Unicode characters (more precisely codepoints) are dangerous and should be blacklisted and prohibited for the users to use?
I know that BIDI override characters and the "zero width space" are very prone to make problems, but what others are there?
Thanks
Characters aren’t dangerous: only inappropriate uses of them are.
You might consider reading things like:
Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax
RFC 3454: Preparation of Internationalized Strings (“stringprep”)
It is impossible to guess what you mean by dangerous.
A Golden Rule in security is to whitelist instead of blacklist, instead of trying to cover all bad characters, it is a much better idea to validate based on ensuring the user only use known good characters.
There are solutions that help you build the large whitelist that is required for international whitelisting. For example, in .NET there is UnicodeCategory.
The idea is that instead of whitelisting thousands of individual characters, the library assigns them into categories like alphanumeric characters, punctuations, control characters, and such.
Tutorial on whitelisting international characters in .NET
Unicode Regex: Categories
'HANGUL FILLER' (U+3164)
Since Unicode 1.1 in 1993, there is an empty wide, zero space character.
We can't see it, neither copy/paste it alone because we can't select it!
It need to be generated, by the unix keyboard shortcut: CTRL + SHIFT + u + 3164
It can pretty much 💩 up anything: variables, function name, url, file names, mimic DNS, invalidate hash strings, database entries, blog posts, logins, allow to fake identical accounts, etc.
DEMO 1: Altering variables
The variable hijacked contains a Hangul Filler char, the console log call the variable without the char:
const normal = "Hello w488ld"
const hijaㅤcked = "Hello w488ld"
console.log(normal)
console.log(hijacked)
DEMO 2: Hijack URL's
Those 3 url will lead to xn--stackoverflow-fr16ea.com:
https://stackㅤㅤoverflow.com
https://stackㅤㅤoverflow.com
https://stackㅤㅤoverflow.com
See Unicode Security Considerations Report.
It covers various aspects, from spoofing of rendered strings to dangers of processing UTF encodings in unsafe languages.
U+2800 BRAILLE PATTERN BLANK - a Braille character without any "dots". It looks like a regular "space" but is not classified as one.

Japanese COBOL Code: rules for G literals and identifiers?

We are processing IBMEnterprise Japanese COBOL source code.
The rules that describe exactly what is allowed in G type literals,
and what are allowed for identifiers are unclear.
The IBM manual indicates that a G'....' literal
must have a SHIFT-OUT as the first character inside the quotes,
and a SHIFT-IN as the last character before the closing quote.
Our COBOL lexer "knows" this, but objects to G literals
found in real code. Conclusion: the IBM manual is wrong,
or we are misreading it. The customer won't let us see the code,
so it is pretty difficult to diagnose the problem.
EDIT: Revised/extended below text for clarity:
Does anyone know the exact rules of G literal formation,
and how they (don't) match what the IBM reference manuals say?
The ideal answer would a be regular expression for the G literal.
This is what we are using now (coded by another author, sigh):
#token non_numeric_literal_quote_g [STRING]
"<G><squote><ShiftOut> (
(<NotLineOrParagraphSeparatorNorShiftInNorShiftOut>|<squote><squote>|<ShiftOut>)
(<NotLineOrParagraphSeparator>|<squote><squote>)
| <ShiftIn> ( <NotLineOrParagraphSeparatorNorApostropheNorShiftInNorShiftOut>|
<ShiftIn>|<ShiftOut>)
| <squote><squote>
)* <ShiftIn><squote>"
where <name> is a macro that is another regular expression. Presumably they
are named well enough so you can guess what they contain.
Here is the IBM Enterprise COBOL Reference.
Chapter 3 "Character Strings", subheading "DBCS literals" page 32 is relevant reading.
I'm hoping that by providing the exact reference, an experienced IBMer can tell us how we misread it :-{ I'm particularly unclear on what the phrase "DBCS-characters" means
when it says "one or more characters in the range X'00...X'FF for either byte"
How can DBCS-characters be anything but pairs of 8-bit character codes?
The existing RE matches 3 types of pairs of characters if you examine it.
One answer below suggests that the <squote><squote> pairing is wrong.
OK, I might believe that, but that means the RE would only reject
literal strings containing single <squote>s. I don't believe that's
the problem we are having as we seem to trip over every instance of a G literal.
Similarly, COBOL identifiers can apparantly be composed
with DBCS characters. What is allowed for an identifier, exactly?
Again a regular expression would be ideal.
EDIT2: I'm beginning to think the problem might not be the RE.
We are reading Shift-JIS encoded text. Our reader converts that
text to Unicode as it goes. But DBCS characters are really
not Shift-JIS; rather, they are binary-coded data. Likely
what is happening is the that DBCS data is getting translated
as if it were Shift-JIS, and that would muck up the ability
to recognize "two bytes" as a DBCS element. For instance,
if a DBCS character pair were :81 :1F, a ShiftJIS reader
would convert this pair into a single Unicode character,
and its two-byte nature is then lost. If you can't count pairs,
you can't find the end quote. If you can't find the end quote,
you can't recognize the literal. So the problem would appear
to be that we need to switch input-encoding modes in the middle
of the lexing process. Yuk.
Try to add a single quote in your rule to see if it passes by making this change,
<squote><squote> => <squote>{1,2}
If I remember it correctly, one difference between N and G literals is that G allows single quote. Your regular expression doesn't allow that.
EDIT: I thought you got all other DBCS literals working and just having issues with G-string so I just pointed out the difference between N and G. Now I took a closer look at your RE. It has problems. In the Cobol I used, you can mix ASCII with Japanese, for example,
G"ABC<ヲァィ>" <> are Shift-out/shift-in
You RE assumes the DBCS only. I would loose this restriction and try again.
I don't think it's possible to handle G literals entirely in regular expression. There is no way to keep track of matching quotes and SO/SI with a finite state machine alone. Your RE is so complicated because it's trying to do the impossible. I would just simplify it and take care of mismatching tokens manually.
You could also face encoding issues. The code could be in EBCDIC (Katakana) or UTF-16, treating it as ASCII will not work. SO/SI sometimes are converted to 0x1E/0x1F on Windows.
I am just trying to help you shoot in the dark without seeing the actual code :)
Does <NotLineOrParagraphSeparatorNorApostropheNorShiftInNorShiftOut> also include single and double quotation marks, or just apostrophes? That would be a problem, as it would consume the literal closing character sequence >' ...
I would check the definition of all other macros to make sure. The only obvious problem that I can see is the <squote><squote> that you already seem to be aware of.