ruamel.yaml.cmd rt breaks lists, if containing long string, or hash - ruamel.yaml

I just notices that the command line tool, called like this: "ruamel.yaml.cmd rt --save $YAML_FILE", will break lists that either contain long strings, or hashes:
Example list containing a hash:
Source:
telegraf::inputs:
cpu:
- percpu: true
totalcpu: true
report_active: true
output:
telegraf::inputs:
cpu:
- percpu: true
totalcpu: true
report_active: true
example list containing long string:
source:
rsyslog::config::snippets:
00_forward:
ensure: 'present'
lines:
- 'if $syslogfacility != 1 then {'
- 'action(Name="collector-syslog" Type="omfwd" Target="%{hiera("rsyslog_server")}" Port="514" Action.ResumeInterval="5" Protocol="tcp")'
- '}'
output:
rsyslog::config::snippets:
00_forward:
ensure: present
lines:
- if $syslogfacility != 1 then {
- action(Name="collector-syslog" Type="omfwd" Target="%{hiera("rsyslog_server")}"
Port="514" Action.ResumeInterval="5" Protocol="tcp")
- '}'
I already created a bug report for this, but it was deleted with a comment pointing to https://yaml.readthedocs.io/en/latest/example.html?highlight=indent#output-of-dump-as-a-string.
But I am not sure how this code snipped should help me with the command line tool.
Or is the tool deprecated, and I have to roll my own?

The automatic detection of the indent seems incorrect for your input, as that input is inconsistent (your mappings are indented 2 positions and your sequences 4 positions with an offset for the block sequence indicator of 2). ruamel.yaml.cmd as on PyPI doesn't support different indentation levels for sequences and mappings (ruamel.yaml didn't when that was written, it does now).
Apart from that you cannot set the line width for the output in ruamel.yaml.cmd for older versions ( before 2020-12-01), and those versions are using the default 80 characters for the wrapping.
I recommend you upgrade to 0.5.6 and use the command line options:
yaml rt --indent 2 --width 1024 --save <yourfile>
The appropriate repository for ruamel.yaml.cmd is https://sourceforge.net/p/ruamel-yaml-cmd/code/ci/default/tree/ . A bug report on ruamel.yaml which can only be used from a Python program, should include the minimal source code of the program that reproduces the error, and if not provided, issues will be removed as announced on its create issue page.

Related

perl YAML.pm dump array without quotes

I am writing a Perl script to generate a docker-compose.yml file. I am using the YAML.pl module's DumpFile method to write a complex hash to a file in YAML format.
Some of the arrays are dumping correctly, that is to say, the elements are unquoted, e.g.,
"environment" => [
'MYSQL_ROOT_PASSWORD=secret', 'MYSQL_DATABASE=db',
'MYSQL_USER=dbadmin', 'MYSQL_PASSWORD=secret2',
],
becomes
environment:
- MYSQL_ROOT_PASSWORD=secret
- MYSQL_DATABASE=db
- MYSQL_USER=dbadmin
- MYSQL_PASSWORD=secret2
However there are some arrays that contain what are supposed to be arguments with values comprised of environment variables that docker-compose will get from the environment, i.e., I CANNOT output the raw values here, I need to output the env var that docker-compose will use to get the value. These are being dumped quoted:
"args" => [
'APP_CODE_PATH=${APP_CODE_PATH_CONTAINER}',
'APP_GROUP=${APP_GROUP}',
'APP_GROUP_ID=${APP_GROUP_ID}',
'APP_USER=${APP_USER}',
'APP_USER_ID=${APP_USER_ID}',
'TARGET_PHP_VERSION=${PHP_VERSION}',
'TZ=${TIMEZONE}'
],
becomes:
args:
- 'APP_CODE_PATH=${APP_CODE_PATH_CONTAINER}'
- 'APP_GROUP=${APP_GROUP}'
- 'APP_GROUP_ID=${APP_GROUP_ID}'
- 'APP_USER=${APP_USER}'
- 'APP_USER_ID=${APP_USER_ID}'
- 'TARGET_PHP_VERSION=${PHP_VERSION}'
- 'TZ=${TIMEZONE}'
The output is not supposed to be quoted.
I've combed the YAML.pm docs, but I cannot find anything specific to this question there.
I suspect it is how I'm entering the values in the array, but I can't figure out what I'm doing wrong.
Use YAML::Syck instead. You can turn off that quoting, and it's off by default:
use v5.26;
use YAML::Syck;
my $hash = {
"args" => [
'APP_CODE_PATH=${APP_CODE_PATH_CONTAINER}',
'APP_GROUP=${APP_GROUP}',
],
};
print Dump($hash);
Now it's unquoted:
---
args:
- APP_CODE_PATH=${APP_CODE_PATH_CONTAINER}
- APP_GROUP=${APP_GROUP}
You could use YAML::PP instead. It tries to only quote things if really necessary, or it would look ambiguous.
YAML.pm is old, it was written for YAML 1.0, and it has a lot of bugs (as well as YAML::Syck). See matrix.yaml.io.
use YAML::PP qw(Dump);
print Dump($data);
Also, YAML::PP supports the official YAML 1.1 and 1.2 Schemas (regarding numbers, booleans etc.), while YAML.pm, YML::XS and YAML::Syck do not.
(Disclaimer: I'm the author of YAML::PP)
Regarding "The output is not supposed to be quoted": That should actually not matter. I don't think it would stop working just be cause the values are quoted. I couldn't imagine why that would be the case.

Doxygen EXCLUDE_PATTERNS regex

I am attempting to exclude certain files from my doxygen generated documentation. I am using version 1.8.14.
My files come in this naming convention:
/Path2/OtherFile.cs
/Path/DAL.Entity/Source.cs
/Path/DAL.Entity/SourceBase.generated.cs
I want to exclude all files that do NOT end in Base.generated.cs, and are located inside of /Path/.
Since it appears doxygen claims to use regex for the exclude_patterns variable, I eventually came up with this:
.*\\Path\\DAL\..{4,15}\\((?<!Base\.generated).)*
Needless to say, it did not work. Nor did multiple other variations. So far a simple wildcard * is the only regex character I have gotten to actually work.
doxygen uses QRegExp for a lot of things, so I assumed that was the library used for this variable as well, but even several variations of a pattern that that library claims to support did not work; granted apparently that library is full of bugs, but I would expect some things to work.
Does doxygen actually use a regex library for this variable?
If so, which library is it?
In either case, is there a method of achieving my goal?
My conclusion is; No... Doxygen Doxyfile does not support real regex. Even though they claim that it do. It's just standard wildcards that work.
We ended up with a really awkward solution to work around this.
What we did is that we added a macro in our CMakeLists.txt that creates a string with everything we want to include in INPUT instead. Manually excluding the parts we don't want.
The sad part is that CMakes regex also is crippled. So we couldn't use advanced regex such as negative lookahead in LIST(FILTER EXLUDE) similar to LIST(FILTER children EXCLUDE REGEX "^((?!autogen/public).)*$")... So even this solution is not really what we wanted.
Our CMakeLists.txt ended up looking something like this
cmake_minimum_required(VERSION 3.9)
project(documentation_html LANGUAGES CXX)
find_package(Doxygen REQUIRED dot)
# Custom macros
## Macro for getting all relevant directories when creating HTML documentain.
## This was created cause the regex matching in Doxygen and CMake are lacking support for more
## advanced syntax.
MACRO(SUBDIRS result current_dir include_regex)
FILE(GLOB_RECURSE children ${current_dir} ${current_dir}/*)
LIST(FILTER children INCLUDE REGEX "${include_regex}")
SET(dir_list "")
FOREACH(child ${children})
get_filename_component(path ${child} DIRECTORY)
IF(${path} MATCHES ".*autogen/public.*$" OR NOT ${path} MATCHES ".*build.*$") # If we have the /source/build/autogen/public folder available we create the doxygen for those interfaces also.
LIST(APPEND dir_list ${path})
ENDIF()
ENDFOREACH()
LIST(REMOVE_DUPLICATES dir_list)
string(REPLACE ";" " " dirs "${dir_list}")
SET(${result} ${dirs})
ENDMACRO()
SUBDIRS(DOCSDIRS "${CMAKE_SOURCE_DIR}/docs" ".*.plantuml$|.*.puml$|.*.md$|.*.txt$|.*.sty$|.*.tex$|")
SUBDIRS(SOURCEDIRS "${CMAKE_SOURCE_DIR}/source" ".*.cpp$|.*.hpp$|.*.h$|.*.md$")
# Common config
set(DOXYGEN_CONFIG_PATH ${CMAKE_SOURCE_DIR}/docs/doxy_config)
set(DOXYGEN_IN ${DOXYGEN_CONFIG_PATH}/Doxyfile.in)
set(DOXYGEN_IMAGE_PATH ${CMAKE_SOURCE_DIR}/docs)
set(DOXYGEN_PLANTUML_INCLUDE_PATH ${CMAKE_SOURCE_DIR}/docs)
set(DOXYGEN_OUTPUT_DIRECTORY docs)
# HTML config
set(DOXYGEN_INPUT "${DOCSDIRS} ${SOURCEDIRS}")
set(DOXYGEN_EXCLUDE_PATTERNS "*/tests/* */.*/*")
set(DOXYGEN_FILE_PATTERNS "*.cpp *.hpp *.h *.md")
set(DOXYGEN_RECURSIVE NO)
set(DOXYGEN_GENERATE_LATEX NO)
set(DOXYGEN_GENERATE_HTML YES)
set(DOXYGEN_HTML_DYNAMIC_MENUS NO)
configure_file(${DOXYGEN_IN} ${CMAKE_BINARY_DIR}/DoxyHTML #ONLY)
add_custom_target(docs
COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_BINARY_DIR}/DoxyHTML -d Markdown
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Generating documentation"
VERBATIM)
and in the Doxyfile we added the environment variables for those fields
OUTPUT_DIRECTORY = #DOXYGEN_OUTPUT_DIRECTORY#
INPUT = #DOXYGEN_INPUT#
FILE_PATTERNS = #DOXYGEN_FILE_PATTERNS#
RECURSIVE = #DOXYGEN_RECURSIVE#
EXCLUDE_PATTERNS = #DOXYGEN_EXCLUDE_PATTERNS#
IMAGE_PATH = #DOXYGEN_IMAGE_PATH#
GENERATE_HTML = #DOXYGEN_GENERATE_HTML#
HTML_DYNAMIC_MENUS = #DOXYGEN_HTML_DYNAMIC_MENUS#
GENERATE_LATEX = #DOXYGEN_GENERATE_LATEX#
PLANTUML_INCLUDE_PATH = #DOXYGEN_PLANTUML_INCLUDE_PATH#
After this we can run cd ./build && cmake ../ && make docs to create our html documentation and have it include the autogenerated interfaces in our source folder without including all the other directories in the build folder.
Quick description of what actually happens in the CMakeLists.txt
# Macro that gets all directories from current_dir recursively and returns the result to result as a space separated string
MACRO(SUBDIRS result current_dir include_regex)
# Gets all files recursively from current_dir
FILE(GLOB_RECURSE children ${current_dir} ${current_dir}/*)
# Filter files so we only keep the files that match the include_regex (can't be to advanced regex)
LIST(FILTER children INCLUDE REGEX "${include_regex}")
SET(dir_list "")
# Let us act on all files... :)
FOREACH(child ${children})
# We're only interested in the path. So we get the path part from the file
get_filename_component(path ${child} DIRECTORY)
# Since CMakes regex also is crippled we can't do nice things such as LIST(FILTER children EXCLUDE REGEX "^((?!autogen/public).)*$") which would have been preferred (CMake regex does not understand negative lookahead/lookbehind)... So we ended up with this ugly thing instead... Adding all build/autogen/public paths and not adding any other paths inside build. I guess it would be possible to write this expression in regex without negative lookahead. But I'm both not really fluent in regex (who are... right?) and a bit lazy in this case. We just needed to get this one pointer task done... :P
IF(${path} MATCHES ".*autogen/public.*$" OR NOT ${path} MATCHES ".*build.*$")
LIST(APPEND dir_list ${path})
ENDIF()
ENDFOREACH()
# Remove all duplicates... Since we GLOBed all files there are a lot of them. So this is important or Doxygen INPUT will overflow... I know... I tested...
LIST(REMOVE_DUPLICATES dir_list)
# Convert the dir_list to a space seperated string
string(REPLACE ";" " " dirs "${dir_list}")
# Return the result! Coffee and cinnamon buns for everyone!
SET(${result} ${dirs})
ENDMACRO()
# Get all the pathes that we want to include in our documentation ... this is also where the build folders for the different applications are going to be... with our autogenerated interfaces which we want to keep.
SUBDIRS(SOURCEDIRS "${CMAKE_SOURCE_DIR}/source" ".*.cpp$|.*.hpp$|.*.h$|.*.md$")
# Add the dirs we want to the Doxygen INPUT
set(DOXYGEN_INPUT "${SOURCEDIRS}")
# Normal exlude patterns for stuff we don't want to add. This thing does not support regex... even though it should.
set(DOXYGEN_EXCLUDE_PATTERNS "*/tests/* */.*/*")
# Normal use of the file patterns that we want to keep in the documentation
set(DOXYGEN_FILE_PATTERNS "*.cpp *.hpp *.h *.md")
# IMPORTANT! Since we are creating all the INPUT paths our self we don't want Doxygen to do any recursion for us
set(DOXYGEN_RECURSIVE NO)
# Write the config
configure_file(${DOXYGEN_IN} ${CMAKE_BINARY_DIR}/DoxyHTML #ONLY)
# Create the target that will use that config to create the html documentation
add_custom_target(docs
COMMAND ${DOXYGEN_EXECUTABLE} ${CMAKE_BINARY_DIR}/DoxyHTML -d Markdown
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Generating documentation"
VERBATIM)
I know this isn't the answer anyone who stumbles in on this question wants... unfortunately it seems to be the only reasonable solution...
... you all have my deepest condolences...

How to resolve PintOS unrecognized character \x16

I downloaded and set up PintOS and the dependencies on my home computer, but when I try to run pintos run alarm-multiple, I get the error:
Unrecognized character \x16; marked by <-- HERE after if ($<-- HERE near column 7 at ~/code/pintos/src/utils/pintos line 911.
That line has ^V on it, the synchronous idle control character. I haven't been able to find any information on this problem; it seems like I'm the only one experiencing it.
I have Perl v5.26.0 installed.
Use of literal control characters in variable names was deprecated in Perl 5.20:
Literal control characters in variable names
This deprecation affects things like $\cT, where \cT is a literal control (such as a NAK or NEGATIVE ACKNOWLEDGE character) in the source code. Surprisingly, it appears that originally this was intended as the canonical way of accessing variables like $^T, with the caret form only being added as an alternative.
The literal control form is being deprecated for two main reasons. It has what are likely unfixable bugs, such as $\cI not working as an alias for $^I, and their usage not being portable to non-ASCII platforms: While $^T will work everywhere, \cT is whitespace in EBCDIC. [perl #119123]
The code causing this problem was fixed in PintOS with this commit in 2016:
committer Ben Pfaff <blp#cs.stanford.edu>
Tue, 9 Feb 2016 04:47:10 +0000 (20:47 -0800)
Modern versions of Perl prefer a caret in variable names over literal
control characters and issue a warning if the control character is used.
This fixes the warning.
diff --git a/src/utils/pintos b/src/utils/pintos
index 1564216..2ebe642 100755 (executable)
--- a/src/utils/pintos
+++ b/src/utils/pintos
## -912,7 +912,7 ## sub get_load_average {
# Calls setitimer to set a timeout, then execs what was passed to us.
sub exec_setitimer {
if (defined $timeout) {
- if ($\16 ge 5.8.0) {
+ if ($^V ge 5.8.0) {
eval "
use Time::HiRes qw(setitimer ITIMER_VIRTUAL);
setitimer (ITIMER_VIRTUAL, $timeout, 0);
Perl 5.26 made it a fatal error to use literal control characters in variable names.
The way you fix it is by ensuring that you are using the most recent version of pintOS. The command git clone git://pintos-os.org/pintos-anon ought to do it.
^V is a perlvar. For reasons unknown to me, it was encoded not as ^ V, but as a single unicode character, which caused the program to fail.

How do I fix "Unescaped left brace in regex is deprecated" error in Parse::Yapp?

The Parse::Yapp currently shiping on Ubuntu 16.04 (xenial) is slightly behind perl in that it uses unescaped '{'s in regular expressions. The error message indicates that it's in YappParse.yp which doesn't exist. In the interest of patching it locally until a new version of Parse::Yapp comes down the pipe, what template file is it in?
{yapp}
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\G{ <-- HERE / at YappParse.yp line 288.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/\G%{ <-- HERE / at YappParse.yp line 315.
Parse::Yapp hasn't had a release since 2001. I wouldn't hold my breath. Since it's effectively unmaintained I'd recommend either moving whatever you're using off of it or taking over maintenance. Consider something like Pegex or Regexp::Grammars instead.
Fortunately this problem has been reported twice, both contain patches. See rt.cpan.org 114776 and rt.cpan.org 10668.
Maintenance has been picked up and 1.20 appears to fix the problem.
perldb's stack trace revealed (or implied that) the file is Parse/Yapp/Parse.pm. Here's a patch:
diff -u /usr/share/perl5/Parse/Yapp/Parse.pm{~,}
--- /usr/share/perl5/Parse/Yapp/Parse.pm~ 2001-05-20 07:19:57.000000000 -0400
+++ /usr/share/perl5/Parse/Yapp/Parse.pm 2016-09-18 02:12:09.116799976 -0400
## -880,7 +880,7 ##
return($1, [ $1, $lineno[0] ]);
};
- $$input=~/\G{/gc
+ $$input=~/\G\{/gc
and do {
my($level,$from,$code);
## -907,7 +907,7 ##
and return('START',[ undef, $lineno[0] ]);
$$input=~/\G%(expect)/gc
and return('EXPECT',[ undef, $lineno[0] ]);
- $$input=~/\G%{/gc
+ $$input=~/\G%\{/gc
and do {
my($code);
Hopefully this will save others some detective work.
This also occurs if you are calling a ksh shell script (such as print) from a non-ksh shell (such as bash).

Diff command - avoiding monolithic grouping of consecutive differing lines

Playing around with the standard linux diff command, I could not find a way to avoid the following type of grouping in its output (the output listings here assume the unified format)
This question aims at the case that each line differs by little from its counterpart in the other file, and it's more useful to see each line next to its counterpart.
I would like instead of having groups like this show up in the comparison output:
- line 1
- line 2
- line 3
+ line 1 modified
+ line 2 modified
+ line 3 modified
To get this:
- line 1
+ line 1 modified
- line 2
+ line 2 modified
- line 3
+ line 3 modified
Of course, this is a convenience question as this can be accomplished by writing your own code to post-process the diff output, or diverging from the lcs algorithm with your own algorithm. I don't think variants like wdiff etc. would help much, as the plain diff -U0 output format fits my needs very well except for this grouping property, whereas wdiff introduces other aspects that are not optimal for my case.
I'm looking for a command-line way, or a library that can be used in code, not a UI tool.
I was trying to solve this myself. The closest I go was this:
diff -y -W 10000 file1 file2 | grep '|' | sed 's/\s*|\s*/\n/g'
The one issue is that this assumes there are no "white space" difference at the beginning of the lines (or that you don't care about it).