Context: I'm working with a large text file that is almost excel-like and I'm adding/editing entries. It is a shared file so others can have already edited the file.
I'm working with emacs and I found that they added the command delete-duplicate-lines. This command seems great for pruning extra entries, but it would be nice to know which lines were duplicates (i.e. already existed in the file) so that I would know which entries had already been added. Is there a command that is similar to delete-duplicate-lines, but only points out which lines are duplicates without removing them?
You can use command hlt-highlight-line-dups-region from the Highlight library, to highlight all sets of duplicate lines in the region or (if no active region) in the buffer.
By default, leading and trailing whitespace are ignored
when checking for duplicates, but this is controlled by option
hlt-line-dups-ignore-regexp. And with a prefix arg the behavior
effectively acts opposite to the value of that option. So if the
option says not to ignore whitespace and you use a prefix arg then
whitespace is ignored, and vice versa.
You can also control the colors/faces used to highlight each set of duplicates.
Related
Problem:
In Emacs configuration modes (e.g. conf-xdefaults-mode or conf-space-mode), some special characters are used in unusual ways, for instance when they define keybindings. This messes up the highlighting for the rest of the buffer.
Example:
The ranger rc.conf file uses conf-space-mode which greatly helps its readability. But lines such as:
map # console shell -p%space
map "<any> tag_toggle tag=%any
mess up the highlighting since # usually defines a comment and is followed by font-lock-comment-face until the end of the line and " defines the beginning of a string and is followed by font-lock-string-face until it encounters a closing quote.
Escaping those characters is not an option because it would prevent them from defining the keybindings.
Possible solution:
The best solution I can think of is to fiddle with font lock settings for those configuration modes to remove the highlighting after those special characters. But I will then loose the proper highlighting after those characters when it is suitable.
A compromise could be to keep highlighting after # since this only messes up one line and there are a lot of comments in those configuration files, while removing the highlighting after single and double quotes since those mess up the entire rest of the buffer and strings are not so common in configuration files.
Question:
What is the proper way to deal with these situations?
Is there a way to reset the highlighting at a point in the buffer? or to insert a character which would affect the highlighting (to fix it) without affecting the code? or is there a way to "escape" some characters for the highlighting only without affecting the code?
The easy way
It's probably easiest to just live with it but keep things constrained. Here, I took ranger's default rc.conf and re-arranged some of the font-lock errors.
Let's ignore the blue "map" for now. We have two visible font-lock errors. The map #... is font-locking as a comment, and the map "... font-locking as a string to the end of the buffer. The first error is self-constrained. Comments end at the end of the line. The second error we constrain by adding a comment. (I don't know if ranger would accept comments in the middle of the line, so I'm only using beginning-of-line comments here.)
The second error is now constrained to one line, but a couple more errors have popped up. Quickly adjusting these we get.
This is something that I could live with, as I'm not in conf files all day long (as opposed to say, source code.) It would be even neater if our new "comments" could be included on the same line.
The hard way
You'll want to use Emacs font-lock-add-keywords. Let's get back to that blue map in the first image. It's rendering blue because conf-space-mode thinks that a string, followed by any amount of whitespace, followed by an opening brace should be rendered in font-lock-type-face (the actually regexp that triggers this is ^[_space__tab_]*\\(.+?\\)[_space__tab_\n]*{[^{}]*?$ where _space_ and _tab_ are actual space and tab characters.)
We can override this in a simple fashion by evaluating
(font-lock-remove-keywords
'conf-space-mode
'(("^\\<\\(map\\)\\>" 1 font-lock-variable-name-face)))
and reloading the buffer with C-x C-v RET. Now, every time that the word "map" appears at the beginning of a line it's rendered as font-lock-variable-name-face (yellow in our example.)
You can make this change permanent by adding a hook to your init file.
(add-hook 'conf-space-mode-hook (lambda ()
(font-lock-remove-keywords
nil
'(("^\\<\\(map\\)\\>" 1 font-lock-variable-name-face)))))
This method doesn't appear to work for your comment (#) and string (' ") delimiters as they're defined in the syntax table. Modifying the syntax table to provide special cases is probably more effort than it's worth.
I have this regular expression to find certain keywords on a line:
.*(word1|word2|word3).*
In the find and replace feature of the latest VSCode it works ok and finds the words but it just blanks the lines leaving big gaps in-between.
I would like to delete the entire line including linefeed.
The find and replace feature doesnt seem to support reg exp in the replace field.
If you want to delete the entire line make your regex find the entire line and include the linefeed as well. Something like:
^.*(word1|word2|word3).*\n?
Then ALT-Enter will select all lines that match and Delete will eliminate them including the lines they occupied.
Say you have the following text:
abc
123
abc
456
789
abc
abc
I want to remove all "abc" lines and just keep one. I don't mind sorting. The result should be like this:
abc
123
456
789
If the order of lines is not important
Sort lines alphabetically, if they aren't already, and perform these steps:
(based on this related question: How do I find and remove duplicate lines from a file using Regular Expressions?)
Control+F
Toggle "Replace mode"
Toggle "Use Regular Expression" (the icon with the .* symbol)
In the search field, type ^(.*)(\n\1)+$
In the "replace with" field, type $1
Click ("Replace All").
If the order of lines is important so you can't sort
In this case, either resort to a solution outside VS Code (see here), or - if your document is not very large and you don't mind spamming the Replace All button - follow the previous steps, but in steps 4 and 5, enter these:
(based on Remove specific duplicate lines without sorting)
Caution: Blocks for files with too many lines (1000+); may cause VS Code to crash; may introduce blank lines in some cases.
search: ((^[^\S$]*?(?=\S)(?:.*)+$)[\S\s]*?)^\2$(?:\n)?
replace with: $1
and then click the "Replace All" button as many times as there are duplicate occurrences.
You'll know it's enough when the line count stops decreasing when you click the button. Navigate to the last line of the document to keep an eye on that.
Coming in vscode v1.62 is a command to eliminate duplicate lines from a selection:
Delete Duplicate Lines in the Command Palette
or
editor.action.removeDuplicateLines as a command in a keybinding
(there is no default keybinding for this command)
Here is a very interesting extension: Transformer
Features:
Unique Lines As New Document
Unique Lines
Align CSV
Align To Cursor
Compact CSV
Copy To New Document
Count Duplicate Lines As New Document
Encode / Decode
Filter Lines As New Document
Filter Lines
Join Lines
JSON String As Text
Lines As JSON String Array
Normalize Diacritical Marks
Randomize Lines
Randomize Selections
Reverse Lines
Reverse Selections
Rotate Backward Selections
Rotate Forward Selections
Select Highlights
Select Lines
Selection As JSON String
Sort Lines By Length
Sort Lines
Sort Selections
Split Lines After
Split Lines Before
Split Lines
Trim Lines
Trim Selections
Unique Lines
Removes duplicate lines from the document Operates on selection or
current block if no selection
Unique Lines As New Document
Unique lines are opened in a new document Operates on selection or
current block if no selection
I haven't played with it much besides the "Unique Lines" command but it seems quite nicely done (including attempting a macro recorder!).
To add to #Marc.2377 's reply.
If the order is important and you don't care that you just keep the last of the duplicate lines, simply search for the following regexp if you want to only remove duplicte non-empty lines
^(.+)\n(?=(?:.*\n)*?\1$)
If you also want to remove duplicate empty lines, use * instead of +
^(.*)\n(?=(?:.*\n)*?\1$)
and replace with nothing.
This will take a line and try to find ahead some more (maybe 0) lines followed by the exact same line taken. It will remove the taken line.
This is just a one-shot regex. No need to spam the replace button.
This now also takes the comment of #awk into account, in where the last line has to have a linefeed in order to be identified as a duplicate. This is no longer the case now by excluding the \n from the line to search and adding a $ to the line found.
I just had the same issue and found the Visual Studio Code package "Sort lines". See the Visual Studio Code market place for details (e.g. Sort lines).
This package has the option "Sorting lines (unique)", which did it for me. Take care of any white spaces at the beginning/end of lines. They influence whether lines are considered unique or not.
Install the DupChecker extension, hit F1, and type "Check Duplicates".
It will check for duplicates and ask if you want to remove them.
Try find and replace with a regular expression.
Find:
^(.+)((?:\r?\n.*)*)(?:\r?\n\1)$
Replace:
$1$2
It is possible to introduce some variance in the first group.
If you don't mind some Vim in your VS Code. You can install Vim emulation plugin.
Then you can use vim commands
:sort u
It will sort lines and it will remove duplicates
Sublime Text 3
It has blisteringly fast native permutation functions.
Edit > Permute Lines > Unique or ⇧⌘U, and
Edit > Permute Selections > Unique
Visual Studio Code is my daily driver. But, I keep Sublime Text on standby for these situations.
Not actually in Visual Studio Code, but if it works, it works.
Open a new Excel spreadsheet
Paste the data into a column
Go to the Data tab
Select the column of data (if you haven't already)
Click Remove Duplicates (somewhat in the middle of the bar)
Click OK to remove duplicates.
It is not the best answer, as you specified Visual Studio Code, but as I said: If it works, it works :)
Why does the compare files in Eclipse show difference between an identical line that starts and/or stops with white spaces?
Why would anyone ever want this "feature" anyway, all lines become marked as different. Must be a bug.
I know I can use the ignore white spaces setting, but then it ignores the differens in block indentation as well and I don't want that.
Forgot to answer this one.
It was an extra carriage return on one side (PC newline, and the other had Mac/*nix), that would make it ignore the "ignore whitespace" setting for those lines.
At my new job I'm using Emacs 24 on Windows, and its chief use for me in these particular circumstances is as a file manager.
I'd like to jettison everything from the Dired display except filename, size, and date. This question showed me how to use ls-lisp-verbosity to remove most of the detail that I don't want.
But I haven't found a way to keep from displaying the permissions. I've checked the documentation for ls and for dir, and there doesn't seem to be a flag for "don't show permissions". And so far I haven't found anything in Dired that will omit the permissions. Can this be done?
Your best option is to change the ls switches so that Dired does not list those fields. See M-x man ls for your particular platform, to see what ls switches are available to you.
dired-details.el (and dired-details+.el) are no longer needed if you have Emacs 24.4 (or a pre-release development snapshot). Just use ( to toggle between showing and hiding details.
And in that case you at least have two options to control whether symbolic-link targets or all lines except header and file lines are considered details to hide: dired-hide-details-hide-symlink-targets and
dired-hide-details-hide-information-lines.
If changing ls switches does not help in your case, then you would need to tweak function dired-details-make-current-line-overlay from dired-details.el. The details to be hidden are determined by the first cond clause, which is this (wrapped in ignore-errors):
(dired-move-to-filename t)
That moves point to the beginning of the file name. The next line is this:
(make-overlay (+ 2 bol) (point))
That creates the invisibility overlay from the beginning of the line (bol here) up to the beginning of the file name (point).
If you want something different then you need to get the limits that you want for the overlay. For example, if you want invisibility to start at the file size, then you would search forward with a regexp that finds the beginning of the file size.
You can come up with such a regexp by working from the regexp for dired-move-to-filename-regexp (in library dired.el). It is a very complex regexp that matches everything up to the file name. But you can use it to find the date+time portion, which is either the 7th matching regexp subgroup or the 2nd, depending on whether the date+time is expressed using a locale (western or eastern) or using ISO representation.
You can see how this is handled in the code defining variable diredp-font-lock-keywords-1 of library dired+.el.
But again, the best approach, if it does what you want, is to try to use ls switches to control which fields are listed in the first place. You can easily experiment with switches by using a prefix argument with C-x d - you are prompted for the switches to use.