Why C11 define character constant recursively? - character

The character constants are defined in c11 as:
 Syntax
  character-constant:
   ' c-char-sequence '
   L' c-char-sequence '
   u' c-char-sequence '
   U' c-char-sequence '
  c-char-sequence:
   c-char
   c-char-sequence c-char
  c-char:
   any member of the source character set except the single-quote ', backslash \, or new-line character
   escape-sequence
It is defined recursively, so inside the single-quotes, there are one or more c-chars, like 'abc'.
However as I know, a character constant contains only one c-char, like 'a', doesn't it?

as I know, a character constant contains only one c-char, like 'a', doesn't it?
no, 'abcd' is also a character constant. Its value is technically implementation-defined, but everywhere I've looked it was formed out of the values of the chars, in big-endian order (in that case, 0x61626364)
The C side of cppreference has a discussion of various character constants

Related

How to match exact string in perl

I am trying to parse all the files and verify if any of the file content has strings TESTDIR or TEST_DIR
Files contents might look something like:-
TESTDIR = foo
include $(TESTDIR)/chop.mk
...
TEST_DIR := goldimage
MAKE_TESTDIR = var_make
NEW_TEST_DIR = tesing_var
Actually I am only interested in TESTDIR ,$(TESTDIR),TEST_DIR but in my case last two lines should be ignored. I am new to perl , Can anyone help me out with re-rex.
/\bTEST_?DIR\b/
\b means a "word boundary", i.e. the place between a word character and a non-word character. "Word" here has the Perl meaning: it contains characters, numbers, and underscores.
_? means "nothing or an underscore"
Look at "characterset".
Only (space) surrounding allowed:
/^(.* )?TEST_?DIR /
^ beginning of the line
(.* )? There may be some content .* but if, its must be followed by a space
at the and says that a whitespace must be there. Otherwise use ( .*)?$ at the end.
One of a given characterset is allowed:
Should the be other characters then a space be possible you can use a character class []:
/^(.*[ \t(])?TEST_?DIR[) :=]/
(.*[ \t(])? in front of TEST_?DIR may be a (space) or a \t (tab) or ( or nothing if the line starts with itself.
afterwards there must be one of (space) or : or = or ). Followd by anything (to "anything" belongs the "=" of ":=" ...).
One of a given group is allowed:
So you need groups within () each possible group in there devided by a |:
/^(.*( |\t))?TEST_?DIR( | := | = )/
In this case, at the beginning is no change to [ \t] because each group holds only one character and \t.
At the end, there must be (single space) or := (':=' surrounded by spaces) or = ('=' surrounded by spaces), following by anything...
You can use any combination...
/^(.*[ \t(])?TEST_?DIR([) =:]| :=| =|)/
Test it on Debuggex.com. (Use 'PCRE')

Trying to understand this code (creating a [char]range)

I have code that works, but I have no idea WHY it works.
This will generate a list containing each letter of the English alphabet:
[char[]]([char]'a'..[char]'z')
However, this will not:
[char]([char]'a'..[char]'z')
and this will actually generate a list of numbers from 97 - 122
([char]'a'..[char]'z')
Could any experts out there explain to me how this works (or doesn't)?
In your second example, you are trying to cast an array of characters to a single character [char]. That won't work. In the third example, the 'a' is considered a string by PowerShell. So casting it to [char] tells PowerShell it is a single char. The .. operator ranges over numbers. Fortunately, PowerShell can convert the character 'a' to its ASCII value 97 and 'z' to 122. So you effectively wind up with 97..122. Then in your first example, the [char[]] converts that array of ints back to an array of characters: a through z.
In Powershell 'a' is a [string] type. [char]'a' is, obviously a [char] type. These are very different things.
$string = 'a'
$char = [char]$string
$string can be cast as a [char] because it is a string, consisting of a single character. If there is more than one character in the string, e.g. 'ab' then you need an array of [chars], which is type [char[]]. The extra set of square brackets designates an array.
$string | get-member
$char | get-member
reveals much different methods for the two types. The [char] type has .toint() methods. If you cast it as [int], it assumes the numeric ASCII code for that character.
[int]$char
returns 97, the ASCII code for the letter 'a'.

why can the character's order in regex expression affect sed?

The tv.txt file is as following:
mms://live21.gztv.com/gztv_gz 广州台[可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3]
mms://live21.gztv.com/gztv_news 广州新闻台·直播广州(可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3)
mms://live21.gztv.com/gztv_kids 广州少儿台(可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3)
mms://live21.gztv.com/gztv_econ 广州经济台
I want to group it into three groups.
sed -r 's/([^ ]*)\s([^][()]*)((\(.+\))*|(\[.+\])*)/\3/' tv.txt
got the result:
[可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3]
(可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3)
(可于Totem/VLC/MPlayer播放,记得把高宽比设置成4:3)
When I write it into
sed -r 's/([^ ]*)\s([^][()]*)((\(.+\))*|(\[.+\])*)/\3/' tv.txt
It can't work.
The only difference is [^][()] and [^[]()]; neither of the [^\[\]()] ,escape characters can not make it run properly.
I want to know the reason.
The POSIX rules for getting ] into a character class are a little arcane, but they make sense when you think about it hard.
For a positive (non-negated) character class, the ] must be the first character:
[]and]
This recognizes any character a, n, d or ] as part of the character class.
For a negated character class, the ] must be the first character after the ^:
[^]and]
This recognizes any character except a, n, d or ] as part of the character class.
Otherwise, the first ] after the [ marks the end of the character class. Inside a character class, most of the normal regex special characters lose their special meaning, and others (notably - minus) acquire special meanings. (If you want a - in a character class, it has to be 'first' or last, where 'first' means 'after the optional ^ and only if ] is not present'.)
In your examples:
[^][()] — this is a negated character class that recognizes any character except [, ], ( or ), but
[^[]()] — this is a negated character class that recognizes any character except [, followed by whatever () symbolizes in the regex family you're using, and ] which represents itself.

How do I print a tab character in Pascal?

I'm trying to figure out in all the Internets what's the special character for printing a simple tab in Pascal. I have to format a table in a CLI program and that would be handy.
Single non printable characters can be constructed using their ascii code prefixed with #
Since the ascii value for tab is 9, a tab is then #9. Characters such constructed must be outside literals, but don't need + to concatenate:
E.g.
const
sometext = 'firstfield'#9'secondfield'#13#10;
contains two fields separated by a tab, ended by a carriage return (#13) + a linefeed #10
The ' character can be made both via this route, or shorter by just ending the literal and reopening it:
const
some2 = '''bla'''; // will contain 'bla' with the ticks.
some3 = 'start''bla''end'; // will contain start'bla'end
write( ^i );
:-)

Matlab strcat function troubles with spaces

I'm trying to accomplish this:
strcat('red ', 'yellow ', 'white ')
I expected to see "red yellow white", however, I see "redyellowwhite" on the command output. What needs to be done to ensure the spaces are concatenated properly? Thanks in advance.
Although STRCAT ignores trailing white space, it still preserves leading white space. Try this:
strcat('red',' yellow',' white')
Alternatively, you can just use the concatenation syntax:
['red ' 'yellow ' 'white ']
From the matlab help page for strcat:
"strcat ignores trailing ASCII white space characters and omits all such characters from the output. White space characters in ASCII are space, newline, carriage return, tab, vertical tab, or form-feed characters, all of which return a true response from the MATLAB isspace function. Use the concatenation syntax [s1 s2 s3 ...] to preserve trailing spaces. strcat does not ignore inputs that are cell arrays of strings. "
In fact, you can simply use the ASCII code of space: 32. So, you can solve the problem like this:
str = strcat('red', 32, 'yellow', 32, 'white');
Then you will get str = 'red yellow white'.
You can protect trailing whitespace in strcat() or similar functions by putting it in a cell.
str = strcat({'red '}, {'yellow '}, {'white '})
str = str{1}
Not very useful in this basic example. But if you end up doing "vectorized" operations on the strings, it's handy. Regular array concatenation doesn't do the 1-to-many concatenation that strcat does.
strs = strcat( {'my '}, {'red ','yellow ','white '}, 'shirt' )
Sticking 'my ' in a cell even though it's a single string will preserve the whitespace. Note you have to use the {} form instead of calling cellstr(), which will itself strip trailing whitespace.
This is all probably because Matlab has two forms of representing lists of strings: as a cellstr array, where all whitespace is significant, and as a blank-padded 2-dimensional char array, with each row treated as a string, and trailing whitespace ignored. The cellstr form most resembles strings in Java and C; the 2-D char form can be more memory efficient if you have many strings of similar length. Matlab's string manipulation functions are polymorphic on the two representations, and sometimes exhibit differences like this. A char literal like 'foo' is a degenerate one-string case of the 2-D char form, and Matlab's functions treat it as such.
UPDATE: As of Matlab R2019b or so, Matlab also has a new string array type. Double-quoted string literals make string arrays instead of char arrays. If you switch to using string arrays, it will solve all your problems here.
or you can say:
str = sprintf('%s%s%s', 'red ', 'yellow ', 'white ')