Garbage characters printed by vscode [duplicate] - visual-studio-code

Everytime I use the terminal to print out a string or any kind of character, it automatically prints an "%" at the end of each line. This happens everytime I try to print something from C++ or php, havent tried other languages yet. I think it might be something with vscode, and have no idea how it came or how to fix it.
#include <iostream>
using namespace std;
int test = 2;
int main()
{
if(test < 9999){
test = 1;
}
cout << test;
}
Output:
musti#my-mbp clus % g++ main.cpp -o tests && ./tests
1%
Also changing the cout from cout << test; to cout << test << endl; Removes the % from the output.

Are you using zsh? A line without endl is considered a "partial line", so zsh shows a color-inverted % then goes to the next line.
When a partial line is preserved, by default you will see an inverse+bold character at the end of the partial line: a ‘%’ for a normal user or a ‘#’ for root. If set, the shell parameter PROMPT_EOL_MARK can be used to customize how the end of partial lines are shown.
More information is available in their docs.

Related

Write a lex program that detects and counts the pattern that starts with an uppercase letter, ends with a lowercase letter

I understood the problem and written the below code. My code works and it prints the number of detected valid and invalid patterns when I quit the program with ctrl+z.
Here is my code:
%{
int valid = 0;
int invalid = 0;
%}
%%
([A-Z][a-zA-Z0-9]*[a-z])* {valid++;}
[a-zA-Z0-9]* {invalid++;}
%%
int yywrap(){}
int main(int argc, char **argv[])
{
printf("\n Enter inputs: \n\n");
yylex();
printf("\n\n\tNumber of VALID patterns = %d\n", valid);
printf("\tNumber of invalid patterns = %d\n\n", invalid);
return 0;
}
But I want something like this:
It should print the detected patterns, number of valid patterns and the number of invalid patterns whenever I input a new line.
There should be an EXIT command.
To achieve your goal, you should modify your code like this:
/*** Definition Section ***/
%{
int valid = 0;
int invalid = 0;
%}
/*** Rules Section ***/
%%
([A-Z][a-zA-Z0-9]*[a-z])* {printf("\n\tPattern Detected: %s ", yytext); valid++;}
[a-zA-Z0-9]* {invalid++;}
"\n" {
printf("\n\n\tNumber of VALID patterns = %d\n", valid);
printf("\tNumber of invalid patterns = %d\n\n", invalid);
valid = 0;
invalid = 0;
}
EXIT__ return 0;
%%
/*** User code section***/
int yywrap(){}
int main(int argc, char **argv[])
{
printf("\n Enter inputs: \n\n");
yylex();
return 0;
}
Here main change comes in the rule section.
Rule-1: ([A-Z][a-zA-Z0-9]*[a-z])* It detect and count valid patterns that starts with an uppercase letter, ends with a lowercase letter. In action, it prints the detected patterns and does the counting job too. Here yytext contains the text in the buffer, for this rule, it's the detected pattern.
Rule-2: [a-zA-Z0-9]* Keep a track of invalid patterns. It will help to prevent returning unmatched patterns.
Rule-3: "\n" It detects when you input a new line. In action, it prints the detected patterns, the number of valid patterns, and the number of invalid patterns whenever I input a new line. Also, reset the variables for counting to zero for the next line of input.
Rule-4: EXIT__ whenever you will input this exact command, the program will exit.
You can avoid printing the numbers of valid and invalid patterns inside the main function in the user code section.
But if you want to print the numbers of detected valid and invalid patterns at the end too, then this program will require a few modifications.

Unicode Character in flex?

I have a simple question about two unicode-characters, which I want to use in my programming language. For an assignement I want to use the old APL Symbols ← as well as →.
My flex-file (snazzle.l) looks like the following:
/** phi#gress.ly 2017 **/
/** parser for omni programming language. **/
%{
#include <iostream>
using namespace std;
#define YY_DECL extern "C" int yylex()
int linenum = 0;
%}
%%
[\n] {++linenum;}
[ \t] ;
[0-9]+\.[0-9]+([eE][+-]?[0-9]+)? { cout << linenum << ". Found a floating-point number: " << yytext << endl; }
\"[^\"]*\" { cout << linenum << ". Found string: " << yytext << endl; }
[0-9]+ { cout << linenum << ". Found an integer: " << yytext << endl; }
[a-zA-Z0-9]+ { cout << linenum << ". Found an identifier: " << yytext << endl; }
([\←])|([\→])|(:=)|(=:) { cout << linenum << ". Found assignment operator: " << yytext <<endl; }
[\;] { cout << linenum << ". Found statement delimiter: " << yytext <<endl; }
[\[\]\(\)\{\}] { cout << linenum << ". Found parantheses: " << yytext << endl; }
%%
main() {
// lex through the input:
yylex();
}
When I "snazzle" the following input:
x → y;
I get the assignement character a) wrong and b) three (3) times:
0. Found an identifier: x
0. Found assignment operator: �
0. Found assignment operator: �
0. Found assignment operator: �
0. Found an identifier: y
0. Found statement delimiter: ;
How can I add ← and → as possible flex-characters?
Flex produces eight-bit clean scanners; that is, it can handle any input consisting of arbitrary octets. It knows nothing about UTF-8 or Unicode codepoints, but that doesn't stop it from recognizing a Unicode input character as a sequence of octets (not a single character). Which sequence it will be depends on which Unicode encoding you are using, but assuming that your files are UTF-8, → will be the three bytes e2 86 92 and ← will be e2 86 90.
You don't actually have to know that, however; you can just put the UTF-8 sequence into your flex pattern. You don't even need to quote it, although it is probably a good idea because it will prove less confusing if you end up using regular expression operators. Here I really mean quote it, as in "←". \← will not do what you expect, because the \ only applies to the next octet (as I said, flex knows nothing about Unicode encodings), which is only the first of the three bytes in that symbol. In other words, "←"? really means "an optional left-arrow", while \←? means "the two octets \xE2 \x86 optionally followed by \x90". I hope that's clear.
Flex character classes are not useful for Unicode sequences (or any other multi-character sequence) because a character class is a set of octets. So if you write [←], flex will interpret that as "one of the octets \xE2, \x86 or \x90". [Note 1]
Notes
It is rarely necessary to backslash-escape characters inside flex character classes; the only character which must be backslash-escaped is the backslash itself. It is not an error to escape characters which don't need escaping, so flex won't complain about it, but it makes the character classes hard for humans to read (at least, for this human to read). So [\←] means exactly the same as [←] and you could write [\[\]\(\)\{\}] as [][)(}{]. (] does not close a character class if it is the first character in the class, so it is conventional to write parentheses "face-to-face").
It is also not necessary to parenthesize character sequences inside alternatives, so you could write ([\←])|([\→])|(:=)|(=:) as ←|→|:=|=:. Or, if you prefer, "←"|"→"|":="|"=:". Of course, you wouldn't usually do that, since the scanner normally informs the parser about each individual operator. If your intention is to make ← a synonym of :=, then you would probably end up with:
←|:= { return LEFT_ARROW; }
→|=: { return RIGHT_ARROW; }
Rather than inserting printf actions in your scanner specification, you would be better off asking flex to put your scanner in debug mode. That is as simple as adding -d to the flex command line when you are building your scanner. See the flex manual section on debugging for more details.

Is EOF hidden in txt file?

I have made an .exe file (echo_eof.exe) which is written in C.
The code goes like this:
#include <stdio.h>
int main(void)
{
int ch;
while ((ch = getchar()) != EOF)
putchar(ch);
}
Then I typed echo_eof < words.txt in Windows cmd where words.txt is written as
Hello world!
The command output is
Hello world!
I have never typed EOF in the text file but it seems like EOF is hidden in the text file. Is this true? If it is, is there a way to see the hidden EOF in the text file?
If your reading function is at the end of the file and can't get another symbol (probably char), then it gets told that you have reached EOF.
This is not in the file, it is a signal from the filehandler.

lex program to count the Number of Words

I made the following lex program to count the Number of words in a Textfile. A 'Word' for me is any string that starts with an alphabet and is followed by 0 or more occurrence of alphabets/numbers/_ .
%{
int words;
%}
%%
[a-zA-Z][a-zA-Z0-9_]* {words++; printf("%s %d\n",yytext,words);}
. ;
%%
int main(int argc, char* argv[])
{
if(argc == 2)
{
yyin = fopen(argv[1], "r");
yylex();
printf("No. of Words : %d\n",words);
fclose(yyin);
}
else
printf("Invalid No. of Arguments\n");
return 0;
}
The Problem is that for the following Textfile, I am getting the No. of Words : 13. I tried printing the yytext and it shows that it is taking 'manav' from '9manav' as a word even though it doesnot match my definition of a word.
I also tried including [0-9][a-zA-Z0-9_]* ; within my code but still shows the same output. I want to know why is this happening and possible ways to avoid it.
Textfile : -
the quick brown fox jumps right over the lazy dog cout for
9manav
-99-7-5 32 69 99 +1
First, the manav is perfectly matching your definition of word. The 9 in front of it is matched by the . rule. Remember, that white space is not special in lex.
You had the right idea by adding another rule [0-9][a-zA-Z0-9_]* ; but since the ruleset is ambiguous (there are several ways to match the input) order of the rules matters. It's a while I worked with lex but I think putting the new rule before the word rule should work.

What is the fastest way to autobreak a line of gigabytes separated by keywords using bash shell?

For example, given a line a11b12c22d322 e... the fields of break are the numbers or spaces, we want to transform it into
a
b
c
d
e
...
sed need to read the whole line into memory, for gigabytes a line, it would not be efficient, and the job could not be done if we don't have sufficient memory.
EDIT:
Could anyone please explain how do grep, tr, Awk, perl, and python manipulate the memory in reading a large file? What and how much content do they read into memory once a time?
If you use gawk (which is the default awk on Linux, I believe), you can use the RS parameter to specify that multi-digit numbers or spaces are recognized as line terminators instead of a new-line.
awk '{print}' RS="[[:digit:]]+| +" file.txt
As to your second question, all of these programs will need to read some fixed number of bytes and search for its idea of a line separator in an internal buffer to simulate the appearance of reading a single line at a time. To prevent it from reading too much data while searching for the end of the line, you need to change the programs idea of what terminates a line.
Most languages allow you to do this, but only allow you to specify a single character. gawk makes it easy by allowing you to specify a regular expression to recognize an end-of-line character. This saves you from having to implement the fixed-size buffer and end-of-line search yourself.
Fastest... You can do it with help of gcc, here's a version which reads data from given file name if given, otherwise from stdin. If this is still too slow, you can see if you can make it faster by replacing getchar() and putchar() (which may be macros and should optimize very well) with your own buffering code. If we want to get ridiculous, for even faster, you should have three threads, so kernel can copy next block of data with one core, while another core does processing, and third core copies processed output back to kernel.
#!/bin/bash
set -e
BINNAME=$(mktemp)
gcc -xc -O3 -o $BINNAME - <<"EOF"
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int sep = 0;
/* speed is a requirement, so let's reduce io overhead */
const int bufsize = 1024*1024;
setvbuf(stdin, malloc(bufsize), _IOFBF, bufsize);
setvbuf(stdout, malloc(bufsize), _IOFBF, bufsize);
/* above buffers intentionally not freed, it doesn't really matter here */
int ch;
while((ch = getc(stdin)) >= 0) {
if (isdigit(ch) || isspace(ch)) {
if (!sep) {
if (putc('\n', stdout) == EOF) break;
sep = 1;
}
} else {
sep = 0;
if (putc(ch, stdout) == EOF) break;
}
}
/* flush should happen by on-exit handler, as buffer is not freed,
but this will detect write errors, for program exit code */
fflush(stdout);
return ferror(stdin) || ferror(stdout);
}
EOF
if [ -z "$1" ] ; then
$BINNAME <&0
else
$BINNAME <"$1"
fi
Edit: I happened too look at GNU/Linux stdio.h, some notes: putchar/getchar are not macros, but putc/getc are, so using those instead might be a slight optimization, probably avoiding one function call, changed code to reflect this. Also added checking return code of putc, while at it.
With grep:
$ grep -o '[^0-9 ]' <<< "a11b12c22d322 e"
a
b
c
d
e
With sed:
$ sed 's/[0-9 ]\+/\n/g' <<< "a11b12c22d322 e"
a
b
c
d
e
With awk:
$ awk 'gsub(/[0-9 ]+/,"\n")' <<< "a11b12c22d322 e"
a
b
c
d
e
I'll let you benchmark.
Try with tr:
tr -s '[:digit:][:space:]' '\n' <<< "a11b12c22d322e"
That yields:
a
b
c
d
e