Publish an R code in github - github

I have written some codes in R that I have compiled into a package. Unfortunately
I am not able to publish it as a package such that any user may download it with the
install_github() function.
Kindly help.
I have shared the path for the repository below.
https://github.com/Kagereki/RPerio

When I try to install your package I get the following error:
Error: Line starting 'Population-Based Sur ...' is malformed!
The specification for DESCRIPTION files states that
DESCRIPTION uses a simple file format called DCF, the Debian control format. You can see most of the structure in the simple example below. Each line consists of a field name and a value, separated by a colon. When values span multiple lines, they need to be indented:
Description: The description of a package is usually long,
spanning multiple lines. The second and subsequent lines
should be indented, usually with four spaces.
Inspecting your DESCRIPTION file shows that its Description field is indeed formatted incorrectly, with the second line not indented:
Description: A collection of tools for Case Definitions and prevalences in
Population-Based Surveillance of Periodontitis.
The functionality is experimental and functions are likely to ...
Note that this line begins with "Population-Based Sur ...", as suggested by the error message.
Make sure your DESCRIPTION is properly formatted and see if that fixes things.
You should be able to use the check() function from devtools to make sure that everything is working locally before you push up a new version.

Related

Dataprep import dataset does not detect headers in first row automatically

I am importing a dataset from Google Cloud Storage (parameterized) into Dataprep. So far, this worked perfectly fine and one of the feature that I liked is that it auto detects that the first row in my (application/octet-stream) .csv file are my headers.
However, today I tried to import a new dataset and it did not detect the headers, but it auto assigned column1, column2...
What has changed and or why is this the case. I have checked the box auto-detect and use UTF-8:
While the auto-detect option is usually pretty good, there are times that it fails for numerous reasons. I've specifically noticed this when the field names contain certain characters (e.g. comma, invisible characters like zero-width-non-joiners, null bytes), or when multiple different styles of newline delimiters are used within the same file.
Another case I saw this is when there were more columns of data than there were headers.
As you already hit on, you can use the following snippet to do mostly the same thing:
rename type: header method: filter sanitize: true
. . . or make separate recipe steps to convert the first row to header and then bulk-rename to your own liking.
More often than not, however, I've found that when auto-detect fails on a previously working file, it tends to be a sign of some sort of issue with the source file. I would look for mismatched data, as well as misplaced commas within the output, as well as comparing the header and some data rows to the original source using a plaintext editor.
When all else fails, you can try a CSV validator . . . but in my experience they tend to be incredibly opinionated when it comes to the formatting options of the fileā€”so depending on the system generating the CSV, it could either miss any errors or give false-positives. I have had two experiences where auto-detect fails for no apparent reason on perfectly clean files, so it is possible that process was just skipped for some reason.
It should also be noted that if you have a structured file that was correctly detected but want to revert it, you can go to the dataset details, select the "..." (More) button, and choose "Remove structure..." (I'm hoping that one day they'll let you do the opposite when you want to add structure to a raw dataset or work around bugs like this!)
Best of luck!
Can be resolved as a transformation within a Flow:
rename type: header method: filter sanitize: true

Hash of a textual file for integrity purposes

I have a more general requirement to track changes in asset files that are committed into source code and deployed inside the binaries, but for now I am implementing it in unit testing context and facing a potential problem for the future. Before asking the TLDR question I will show a lot of contextual information.
Scenario
Some application assets are loaded from CSV files committed into Git repository via ClasspathResource[1] and they may sometime change. Change occurs across commits, but for a runtime application the change occurs across different versions of the application.
My test solution
I have implemented the following mechanism to alert me about changes in the resource:
#Before
public void setUp() throws Exception
{
assertEquals("Resource file has changed. Make sure the test reflects the changes in the file and update the checksum", MD5_OF_FILE,
DigestUtils.md5Hex(new ClassPathResource("META-INF/resources/assets.csv").getInputStream()));
}
Basically, I want my unit tests to fail until I explicitly code the checksum of the file. When I run md5sum assets.txt I hardcode the result into the code so tests know they are working with a fixed version of the file.
Problem
I ran the tests on my own Windows box and worked like a charm. Switching to Linux, I found that they failed. Immediately I realized that it may be due to line endings, which I totally forgot.
In the specific case, Git is configured to commit files LF but checkout (in Windows) CRLF. This configuration is reasonable for working with source code.
So I need to check if the asset file has changed in a smart way that allows a box to change/reinterpret the line endings. This is especially true for the runtime application which will store the file hash and will compare the actual assets file (which may have changed), performing corrective actions on differences ==> reloading the assets.
TL;DR
Given a textual file of which I can extract and store any hash (not just cryptographic, I used MD5), how can I tell that it has changed regardless of the environment the file is processed into, which may modify the line endings?
Note
I have requirement not to use a versioning system in the asset itself (e.g. first row has incremental version, since developers will fail to update correctly).
[1] Spring framework tool wrapping Class.getResourceAsStream
A solution could be normalizing the file to chosen line endings, i.e. always CRLF or always LF, then compute the cryptographic hash over that normalized content.
E.g. compute md5sum | dos2unix file and use a proper Stream in code that normalizes the file on the fly

Cond statement doxygen does not work

I am trying to separate out internal and external documentation using the doxygen constructs of cond; but i just cant seem to get get it working. I would essentially like to exclude some files completely and not conditionally. Regardless of where i add the tag (before include, before header guards etc) , the files and source both show up.
What i have tried in vain is to take the test file from doxegen repo for
conditional test and add it to the project.
Steps to reproduce [Linux]
create a new directory.
copy paste the above file (had to rename it to .h as .c was passed over?).
generate dummy config via doxygen -g.
update Doxyfile ENABLED_SECTION = COND_ENABLED.
Run doxygen.
check html/index.html
This however is still visible in the html documentation it generates for the project. I have set the ENABLED_SECTION variable with other values , but cond_enabled function still shows up. Running the testing directory of the project (doxygen) it passes. So i am lost.
Any suggestions?
Tried with latest version 1.8.14.
Thanks!
Regarding the \cond problems (not an answer directly to the real problem you face, I think, but to long for a comment).
The mentioned file is used in the, limited, testing doxygen can do / does and the first lines contain some instructions on what to do. Furthermore there is a default Doxyfile with the tests in use. It is hard to run a separate test outside the doxygen build tree.
Regarding the remark "Running the testing directory of the project (doxygen) it passes." This is correct, here, at the moment, only testing is done against the XML output and the generated output is compared to a once created version of the XML output. No tests are done, at the moment, in respect to HTML or PDF / LaTeX. Recently the test framework has been slightly extended so in the future this should be possible (compare the xhtml and tex output, but some work has still to be done here).
The version of the parser sees the \cond in the first line (normal C comment) as a doxygen command and skips everything till the first \endcond (your friend in these cases is always doxygen -d preprocessor). I think that removing / modifying the first line will result in an already better result. There is however another hiccup for e.g. HTML output. As the function cond_enabled is not documented and EXPAND_ALL is not set to YES the function will not appear in the documentation. So best is also to add a line of documentation with the function cond_enabled.
Regarding the seen HTML problems I modified the the relevant test in doxygen slightly and pushed a proposed patch to github (pull request 714, https://github.com/doxygen/doxygen/pull/714).
Note: the problem of skipping the \cond in normal C comment is quite a bit harder to implement (seen the logical complexity of the doxygen code in pre.l and commentcnv.l.
EDIT: 2018/06/10: The push request has been integrated in the master version on github.

Store row numbers which are causing "error"

I have to retrieve certain information from urls. For this I have to enter text into fields of the url. I am using GET operation for this. I have to modify the text to replace spaces with "%20". Some times the text(which is taken from the database) is badly formed. I would like to know the row numbers so I can manually change the text for such rows in the database and run it again. I have tried to use the logs and errors section but with little luck. Does anybody have an idea of how to do this?
First shot: Output bad urls on the console
So far, I came up with the following job design for your problem:
The trick is to catch the exceptions of the tHttpRequest component and print the necessary details on the console. For this example, I included the line number, the exception message and the URL that produced the exception.
Output (I couldn't reproduce your "Illegal character error", so I took a different one):
Second shot: Output to a file
If you really need to output the line numbers to a file, things get a little more complicated.
Instead of printing the info straight onto the console, we collect all line numbers into a context variable of type (Java) List inside the tJavaFlex. After the usual URL processing (which I have left out from the job design to keep the example small), we iterate over the Java List
and save it into a tHashOutput, so that we can finally write to a file.
We cannot directly write to the file in the tLoop section, since the Iterate flow would lead to the situation the the tFileInputDelimited would be opened several times. If "Append" was disabled, only the last bad URL line number would finally appear in the output file. If "Append" was enabled, you would get the full list of line numbers after the very first job run - but you would append every time you run the job, making the list longer and longer. Workarounds would be to use a runtime-dependent file name (e.g. timestamp) or to delete the file at the beginning of the job run. I chose the third option, that overwrites the file every time we run the job. Feel free to chose among those options the one which suits your use case best.
Details
The tHashOutput/tHashInput components are not visible on default, but must be enabled first to show up: https://www.talendforge.org/forum/viewtopic.php?pid=107249#p107249
Context variable:
INIT:
tJavaFlex "catch errors", end code:
tLoop:
tFixedFlowInput "badURL":
tHashOutput:
Needs to have "Append" enabled.

Making stable names for doxygen html docs pages

I need to refer to Doxygen documentation pages. The file names however are not stable as they change after every generation. My idea is to create a symlink to each HTML file created by Doxygen , having a stable and human friendly name. Have anyone tried this?
Actually, it might be very easy just to parse the annotated.html file Doxygen produces. Any documented class shows up there as a line like:
`<tr><td class="indexkey"><a class="el" href="dd/de6/a00548.html">
ImportantClass</a></td>`
The hard problem for me is that I would like to have my file names (i.e. the symlinks) be visible on my server like:
http://www.package.com/com.package.my.ImportantClass.html
[Yes, the code is in java]. So the question actually reads: "how to connect a HTML page by Doxygen with the right java class name and its package name.
You seem to have SHORT_NAMES enabled, which will indeed produce volatile names. When you set SHORT_NAMES to NO in the configuration file (the default), you will get longer names, but these are stable over multiple runs (i.e. they are based on the name, and for functions also on (a hash of) the parameters.