pycharm not providing all pyspark suggestion - pyspark

I am using pycharm as IDE for pyspark development. I have added pyspark library as content root in my project in pycharm. But still it is not showing different methods that can be applied on pressing (ctrl + space)
e.g. in below code, read will return DataFrameReader object. Pycharm is providing methods applicable on sparkSession. However after read, I am not getting any suggestion of methods applicable on DataFrameReader like option, format, parquet etc.
sparkSession.read.option("header", "true").csv("sample.csv")
Do I have to do some extra setting in pycharm or is there any other better editor for pyspark development?

Related

Reading Apache Arrow files in Spark

I am using Pyspark and I would like to read files of type Apache Arrow, which have ".arrow" as extension. I unfortunately couldn't find any way to do this, would be grateful for any help.

Can I configure Jupyter Notebook to split source files and generated files?

I really like Jupyter Notebooks.
However, working with them is cumbersome in conjunction with a source control system like git, because an ipynb-File contains the source code (what you actually write in the notebook) and the generated output text / HTML / images / metadata / ...
For example, merge conflicts are difficult to resolve now, because everything is stored in one huge file with lots of generated data.
I wonder if I can configure Jupyter to store notebooks as
A source file: For example, I imagine this to be a Markdown file where everything surrounded by three backticks (```) is interpreted as a code cell. Diffs of that file would be meaningful and merge conflicts would be simple to resolve manually.
A generated file: This contains everything else. If there is a merge conflict within this file, it can be resolved by regenerating it.
Is this possible?
For reference: There is a slightly more general version of this question which lists various efforts at adapting IPython and Jupyter to this effect, and this answer proposes to solve the problem via Git. There is a Github project with a Git filter based on that answer, and (in its edit at the end) the answer links a few similar tools like nbstripout.

Configure when new lines are wrapped when formatting code using Scala IDE

Is it possible to edit Scala formatter so that when formatting text it does not wrap until X amount of characters.
It is possible for java files to create a new profile but looking at Scala editor options this does not seem possible :

IPython (Jupyter) MathJaX preamble

Question
How can I setup a MathJax "preamble" for use in IPython (or Jupyter) notebooks for repeated use in a way that is convenient for others to read my documents (on http://nbviewer.org) and that works for LaTeX/PDF generation?
Background
I would like to use IPython (now Jupyter) notebooks for documents that I later convert to PDF via LaTeX (using ipython nbconvert). The problem is how to include a bunch of macro definitions that I use in almost every document. Something like:
\newcommand{\vect}[1]{\vec{#1}}
\newcommand{\abs}[1]{\lvert#1\rvert}
\DeclareMathOperator{\erf}{erf}
etc. As far as the notebooks is concerned, one unsatisfactory solution is to simply include these in a markdown cell at the top of the notebook, embeded between two dollar signs $$ so it is interpreted as math. If this is done after some introductory text, then it does not even affect the output.
The problem is that, when converting to LaTeX (for PDF export), these commands are embedded in a math environment in the LaTeX file. This has several problems:
Commands like \DeclareMathOperator must come in the LaTeX document preamble.
Command definitions are local to the equation and not available later in the document. (This can be overcome by using \gdef or \global\def but then one must trick MathJax into recognising these commands with something like \let\gdef{\def} which is somehow hidden from LaTeX. Any way I have found of making this work amounts to an ugly hack.)
Sometimes commands are already defined in LaTeX and need to have \renewcommand (not supported by MathJax, but again can be provided by \let\renewcommand\newcommand etc. which seems reasonable to me since MathJax can't have some idea of what preamble might be used for the final LaTeX file).
Probably the solution is to provide a set of macros to MathJax by adding code like (not sure the equivalent of \DeclareMathOperator here...)
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: {
Macros: {
vect: ["{\\vec #1}",1],
abs: ["{\\lvert #1 \\rvert}",1]
}
}
});
</script>
to a custom.js file and then providing a LaTeX package for inclusion when converting to PDF. The problem I have with this approach is: How to distribute the custom.js file and LaTeX style file for others (collaborators and viewers) to use?
I want collaborators to be able to edit and read my documents without having to install custom extensions in their global configuration. To be specifiec, I am fine with requiring them to run a command like python setup.py configure once they download/checkout my code which does local modifications to the project like populating ipython_notebook_config.py files in all directories containing notebooks, but am not happy installing extensions, or modifying their personal global custom.js file.
My stumbling block here is that I don't know how to add contributions from a local custom.js file to the notebook chain, and suspect that this might violate a security policy.
The best solution would not require any action on my collaborator's part.
I want my notebooks to work on http://nbviewer.org, and for people to be able to download the notebook and produce a PDF. (I think this rules out the possibility of using custom.js hacks and a distributed *.sty file, but am not certain.)
I would prefer to be able to simply start a new notebook and then start writing without having to insert a bunch of boilerplate code at the start of each notebook, though would be amenable to having a simple way of automating this process using an notebook extension or some hooks in python_notebook_config.py.
References
The following posts address some of these issues, but fall short on most fronts:
usepackage and making macros in ipython notebook
Physics bra-ket symbols in IPython (specifically this answer notes related difficulties)
How do I get MathJax to enable the mhchem extension in ipython notebook
Discussions about (potential) problems with the pandoc production of LaTeX files from IPython notebooks:
Getting some problems with pandoc and mathjax
\newcommand environment when convert from markdown to pandoc
Pandoc IPython notebook loses some Mathjax
General discussion of math in notebooks:
How to write LaTeX in IPython Notebook?
I think you can solve some of your problems, but not all.
First, the stumbling block. I believe (though I might be wrong) that nbviewer doesn't look at anything but the notebook itself. For example, I don't see how it could run an ipython_notebook_config.py stored alongside your notebook. So that rules out that line of thought, meaning that I think you'll have to bite the bullet and add boilerplate to every notebook. But you might at least be able to minimize the boilerplate. In that vein:
You could maintain your custom.js (probably under a more descriptive name) on github or whatever, and then add one line of boilerplate to all your notebooks to load that script from the URL. You would still need boilerplate, but it would be a lot shorter.
Once you have executed the code cell containing the javascript, it is saved in the notebook, which means that it will automatically happen the next time the browser loads it, even before the code cell is executed. So unless nbviewer prevents the javascript's execution, it should work just fine. This would also make things work nicely for collaborators, since they wouldn't have to download additional files.
As for your own style file, I suspect that anyone sophisticated enough to install ipython and latex, download your notebook, and run nbconvert on it would also be sophisticated enough to download the .sty file. Anyway, I don't see any way around the need to do that...

auto-import of packages across project in eclipse

I have used file search functionality in eclipse to replace a specific occurrence of text with a replace text which contains a java method name in all files across the workspace.
But now in all those files I have to add the import statement (for the method name to resolve).
Is there an automatic way of doing this instead of manually searching and importing the package myself in all files?
P.S.: I can't use java refactoring since the text I have changed is not a java element and Organize import option will make changes to lot of files (re-ordering imports) which will be problematic when I need to commit, I again have to manually check differences to see which files have actual changes instead of just organize import statements.
Instead of doing this with search and replace, try to use Refactor->rename which will do it correctly.
another options is to use organize imports, you can do it for an entire project as well (source -> organize imports).