How can I remove duplicate lines in Visual Studio Code? - visual-studio-code

Say you have the following text:
abc
123
abc
456
789
abc
abc
I want to remove all "abc" lines and just keep one. I don't mind sorting. The result should be like this:
abc
123
456
789

If the order of lines is not important
Sort lines alphabetically, if they aren't already, and perform these steps:
(based on this related question: How do I find and remove duplicate lines from a file using Regular Expressions?)
Control+F
Toggle "Replace mode"
Toggle "Use Regular Expression" (the icon with the .* symbol)
In the search field, type ^(.*)(\n\1)+$
In the "replace with" field, type $1
Click ("Replace All").
If the order of lines is important so you can't sort
In this case, either resort to a solution outside VS Code (see here), or - if your document is not very large and you don't mind spamming the Replace All button - follow the previous steps, but in steps 4 and 5, enter these:
(based on Remove specific duplicate lines without sorting)
Caution: Blocks for files with too many lines (1000+); may cause VS Code to crash; may introduce blank lines in some cases.
search: ((^[^\S$]*?(?=\S)(?:.*)+$)[\S\s]*?)^\2$(?:\n)?
replace with: $1
and then click the "Replace All" button as many times as there are duplicate occurrences.
You'll know it's enough when the line count stops decreasing when you click the button. Navigate to the last line of the document to keep an eye on that.

Coming in vscode v1.62 is a command to eliminate duplicate lines from a selection:
Delete Duplicate Lines in the Command Palette
or
editor.action.removeDuplicateLines as a command in a keybinding
(there is no default keybinding for this command)
Here is a very interesting extension: Transformer
Features:
Unique Lines As New Document
Unique Lines
Align CSV
Align To Cursor
Compact CSV
Copy To New Document
Count Duplicate Lines As New Document
Encode / Decode
Filter Lines As New Document
Filter Lines
Join Lines
JSON String As Text
Lines As JSON String Array
Normalize Diacritical Marks
Randomize Lines
Randomize Selections
Reverse Lines
Reverse Selections
Rotate Backward Selections
Rotate Forward Selections
Select Highlights
Select Lines
Selection As JSON String
Sort Lines By Length
Sort Lines
Sort Selections
Split Lines After
Split Lines Before
Split Lines
Trim Lines
Trim Selections
Unique Lines
Removes duplicate lines from the document Operates on selection or
current block if no selection
Unique Lines As New Document
Unique lines are opened in a new document Operates on selection or
current block if no selection
I haven't played with it much besides the "Unique Lines" command but it seems quite nicely done (including attempting a macro recorder!).

To add to #Marc.2377 's reply.
If the order is important and you don't care that you just keep the last of the duplicate lines, simply search for the following regexp if you want to only remove duplicte non-empty lines
^(.+)\n(?=(?:.*\n)*?\1$)
If you also want to remove duplicate empty lines, use * instead of +
^(.*)\n(?=(?:.*\n)*?\1$)
and replace with nothing.
This will take a line and try to find ahead some more (maybe 0) lines followed by the exact same line taken. It will remove the taken line.
This is just a one-shot regex. No need to spam the replace button.
This now also takes the comment of #awk into account, in where the last line has to have a linefeed in order to be identified as a duplicate. This is no longer the case now by excluding the \n from the line to search and adding a $ to the line found.

I just had the same issue and found the Visual Studio Code package "Sort lines". See the Visual Studio Code market place for details (e.g. Sort lines).
This package has the option "Sorting lines (unique)", which did it for me. Take care of any white spaces at the beginning/end of lines. They influence whether lines are considered unique or not.

Install the DupChecker extension, hit F1, and type "Check Duplicates".
It will check for duplicates and ask if you want to remove them.

Try find and replace with a regular expression.
Find:
^(.+)((?:\r?\n.*)*)(?:\r?\n\1)$
Replace:
$1$2
It is possible to introduce some variance in the first group.

If you don't mind some Vim in your VS Code. You can install Vim emulation plugin.
Then you can use vim commands
:sort u
It will sort lines and it will remove duplicates

Sublime Text 3
It has blisteringly fast native permutation functions.
Edit > Permute Lines > Unique or ⇧⌘U, and
Edit > Permute Selections > Unique
Visual Studio Code is my daily driver. But, I keep Sublime Text on standby for these situations.

Not actually in Visual Studio Code, but if it works, it works.
Open a new Excel spreadsheet
Paste the data into a column
Go to the Data tab
Select the column of data (if you haven't already)
Click Remove Duplicates (somewhat in the middle of the bar)
Click OK to remove duplicates.
It is not the best answer, as you specified Visual Studio Code, but as I said: If it works, it works :)

Related

Visual Studio Code select same position above and below not the whole line before and after (see image)

I'm using visual studio code and run into a weird problem. I'm not sure how I got here - I could have accidently pressed a shortcut unknowingly.
I'm trying to select a phrase, link or anything that crosses multiple lines (whether the lines are true lines or due to word wrap). When I select multiple lines, it doesn't automatically select the text at the start and end between the two points. Rather, it just selects the length of text for that line and repeats it in the subsequent lines. See the image below to understand.
Image of issue
As you can see, I am trying to select the words from "the" to the end of "sub". Instead of selecting all the words between the two, it selects the text "the instru" and selects every line with the same amount of characters/length.
In order to show what I am expecting, I have pasted the text into Notepad and done the same thing.
What I am expecting
As you can see, all the words between "the" and "sub" are selected.
If anyone has any idea about how to fix this, I would be greatly appreciative.
Below is a copy of the text if the images don't display.
Follow the instructions below for a click guide to retire and/or add 'School'.
Best practice if there is a change in 'School' structure would be to 'retire' any existing school setup that is no longer required and add the new sub school information. The reason why we don't just edit existing school names (typically) is due to leaving historical data intact.
Try using ctrl+shift+P and typing "Toggle Column Selection Mode"

Select nonadjacent lines containing a common phrase in vscode

I have an HTML file that has around 700 of my bookmarks. Each line has link and a tag like the following:
<li>Strunk, William, Jr. 1918. The Elements of Style</li>
The file has multiple lines with the same tags. I want to group the lines with the same tag next to each other. I was trying to do it in vscode. I can select multiple occurrences of the same phrase with Ctrl+Shift+L, but I could not select the lines. Is there a way for doing this?
After your comment below that clarified what you are trying to do I think you will find this easier than your solution.
Select the text to check.
Ctrl-Shift-L selects all occurrences. The command is Select All Occurrences of Find Match - if that is bound to something else on your OS, use that.
Ctrl-L will select the entire line. (Changed from Ctrl-i in Feb. 2019.) That is using the command Expand Line Selection - again find that command in your Keyboard Shortcuts and use the same command.
Cut and paste them where you want.
There is also an extension vscode-dup-checker that will find and delete duplicate lines. I don't know if you actually want to delete the duplicates though.
I added a gif to show it in action - it only uses steps 1-4 above:
Ok, I found one method that works. I don't know if it the best though.
After Ctrl+Shift+L, you have cursors on all the lines with that phrase. Then pressing Home will take you to the beginning of all of them and Shift+End then will select all those lines on which you have the cursor. Then cut the text and paste it wherever you wish. Came out to be pretty useful for me while I was editing a html file with 700 links.

How do you delete lines with certain keywords in VScode

I have this regular expression to find certain keywords on a line:
.*(word1|word2|word3).*
In the find and replace feature of the latest VSCode it works ok and finds the words but it just blanks the lines leaving big gaps in-between.
I would like to delete the entire line including linefeed.
The find and replace feature doesnt seem to support reg exp in the replace field.
If you want to delete the entire line make your regex find the entire line and include the linefeed as well. Something like:
^.*(word1|word2|word3).*\n?
Then ALT-Enter will select all lines that match and Delete will eliminate them including the lines they occupied.

How do I get a cursor on every line in vscode

I'm trying to use the multi cursor functionality of vscode on a large(ish) file.
the file is too large to select every line individually with ctrl-alt-up or down. In sublime-text I would select everything and push ctrl-shift-l. Is there a similar thing in vscode. I've tried using a regex search for ^, but that gives me an error stating "Expression matches everything".
The command Selection / Add Cursors to Line Ends altshifti will put a cursor on every line in the current selection. (For mac use optshifti)
Tip: You can pull up the keyboard shortcut reference sheet with ctrlk,ctrls (as in, those two keyboard combos in sequence).
(For mac use cmdk,cmds)
Hold Alt+Shift and select the block. Then press End or Right button.
You get selected individual lines.
I use version VSCode 1.5.3 in Windows.
Hold Alt+Shift+i
Hold Home (fn+-> Mac) for right-most or End for left most(fn+<- Mac)
This feature is actually called split selection into lines in many editors.
Sublime Text uses the default keybinding, CTRLSHIFT L
VSCode uses ALTSHIFTI
For Atom you actually need to edit your keymap to something like this
'.platform-win32 .editor, .platform-linux .editor':
'ctrl-shift-L': 'editor:split-selections-into-lines'
Real Lines vs Display Lines
First we have to understand the difference between Real Lines and Display Lines to completely understand the answer of the question.
When Word Wrap is enabled, each line of text that exceeds the width of the window will display as wrapped. As a result, a single line in the file may be represented by multiple lines on the display.
The easiest way to tell the difference between Real Lines and Display Lines is by looking at the line number in the left margin of the text editor. Lines that begin with a number correspond to the real lines, which may span one or more display lines. Each time a line is wrapped to fit inside the window, it begins without a line number.
Cursor At the Beginning of each Display Lines:
Cursor At the Beginning of each Real Lines:
Answer to the Question
Now that we know the difference between Display Lines and Real Lines, we can now properly answer the actual question.
Hold AltShift and select the text block.
Press Home to put cursor on the beginning of every Display Line.
Press End to put cursor on the end of every Display Line.
Press HomeHome (Home twice) to put cursor on the beginning of every Real Line.
Press EndEnd (End twice) to put cursor on the end of every Real Line.
Please understand that AltShiftI put cursor on the end of every Real Line.
Install the extension Sublime Commands.
[Sublime Commands] Adds commands from Sublime Text to VS Code: Transpose, Expand Selection to Line, Split into Lines, Join Lines.
(Don't forget to add the keybinding(s) from the extensions details page to your keybindings.json)
Doesn't VS Code already have a "split into lines" command?
Yes, yes it does. However it differs from the one in Sublime.
In VS Code, when you split into lines your selection gets deselected and a cursor appears at the end of each line that was selected (except for the last line where the cursor appears at the end of the selection).
In Sublime, when you split into lines a cursor appears at the end of each line (with the same exception as in VS Code) and the selection is divided on each line and "given" to the same line.
I have the same problem, i'm used to Alt + drag to do 'box selections' in visual studio but it does'n work in code.
It seems to be impossible for now to do it differently than by selecting every single line.
However plugins should be supported soon so we will likely see a plugin for this if not implemented directly by microsoft.
From visual studio uservoice forums:
We plan to offer plugin support for Visual Studio Code. Thank you for your interests and look for more details in our blog in the coming weeks. http://blogs.msdn.com/b/vscode.
For the preview we are looking for exactly this type of feedback. Keep it coming.
Sean McBreen – VS Code Team Member

How to do search and replace involving fields in Microsoft Word?

I have a Word document with fields of the reference variety, which occur in the form "[field].[field]"--in other words, there's a period between the two fields. I want to globally replace this with a space.
Word offers the ^d special character to search for fields, but for some reason the query "^d.^d" does not find anything. However, ".^d" does. Now comes the problem, however--what do I specify as the replacement text in order to retain the field code? If using regular expressions, I could use a "Find What Expression" such as \1, but with regexp ("wild card") mode the ^d is not permitted.
I guess I could write a macro...
I would like to add to Bibadia's solution.
An example of an index entry field; we want to change a name we misspelled.
Make sure hidden formatting is displayed (toggle with SHIFT+CTRL+F8).
Make sure wildcards option is not selected. To search for fields, use the opening and closing field braces code (optionally use ^w for spaces, as Bibadia suggested):^19 XE "Deo, John" ^21
Replace won't recognize field braces character, but will allow to insert the clipboard's content. ;). To do that, insert in text the correct entry. CTRL+F9 to insert field and type:XE "Doe, John"
Select the field above and copy
Use ^c in the replace box
Hit Replace All
Ta-da!
It's usually better to go the macro route when finding fields because, as you say, the find algorithm that Word uses doesn't work the way you might hope with fields.
But if you know exactly what the fields contain, you can specify a search pattern that will probably work (however not in wildcard mode).
For example, if you want to look for figure number field pairs such as
{ STYLEREF 1 \s }.{ SEQ Figure \* ARABIC \s 1 }
(which would typically be the same set of fields everywhere in the document)
If you only really need to look for the following:
{ STYLEREF 1 \s }.<any field>
you could ensure that field codes are displayed and search for
^d STYLEREF 1 \s ^21.^d
or
^19 STYLEREF 1 \s ^21.^19
If you need to be more precise, you can spell out the second field as well.
"^d" only works for finding the field beginning, not the field end.
It's a shame that ^w wants to find at least 1 whitespace character because otherwise it would be more robust to look for
^19^wSTYLEREF^w1^w\s^w^21.^19
Perhaps someone else knows how to work around that without using wildcards?
Torzaburo,
I suggest that you do this using a macro. You can start by recording the macro, and later refining your processing steps within the macro.
First turn on the hidden characters by navigating to Home > Paragraph > toggle the show/hide Paragraph symbol. Also, select all and toggle the field codes on (right-click and select "Toggle Field Codes".
Open a new blank Word doc in addition to the one you have open. You will use this later. Start the macro recording and find the field using the "^d" (field code) as you said.
When the field is found, copy only the field text within the brackets, and not the full field reference. While the macro is still recording, ALT + TAB to the new blank document and paste the field code in as plain text.
At this point, do the necessary find & replace processing to the field codes. Highlight the processed field codes, copy, ALT + TAB back to the original document, and paste back between the { } brackets.
Stop the macro recording. Add any further custom processing to the macro VBA.
Select-All and re-toggle the field codes. Update the field codes.
You don't need a macro. Just toggle all field codes on by using Alt+F9. Then do a find and replace for what you want to change. Once the replacement is complete, use Alt+F9 again to toggle the field codes back off.
Disclaimer: I didn't originate this solution, but it's clean and elegant and I thought it should be included here:
(Adapted from Search & Replace Field Codes in Word):
Create or find a single instance of the field you want to convert text to
Toggle Field Codes visible (AltF9)
Copy the code for the field you want to use to the Clipboard (highlight and CtrlC)
Open the Replace dialog box (CtrlH), insert the text you want to replace in the Find What box and then enter ^c in the Replace With box.
This will replace your text with the contents of the Clipboard, turning it into the field code you copied in step 3. It also copies formatting information (font, color, etc.), to control how the field will appear when hidden. (Caveat: I've tested this with Word 2003 under Windows 7 only.)
Coming in late on this, probably way too late for Beth (sorry Beth). And this may not be quite what Beth was looking for. But for anyone interested ...
It sounds like Beth may have created captions throughout the document using INSERT CAPTION (hence the presence of field codes). This means these captions will have been (automatically) created in CAPTION style.
To globally replace the separator "." with " " (space) in such captions, take two steps:
[1] Go to REFERENCES | INSERT CAPTION, then click on NUMBERING and replace the SEPARATOR "." with "EM-DASH". This will replace all separators in captions for the selected label in the CAPTION Window. If you have other labels in use in the document (e.g. FIGURE), select the other labels one by one and repeat this process.
[2] Do a find/replace searching for special character "em-dash" (^+) in style CAPTION, replacing with " ". Click REPLACE ALL.
Voila!
NOTE: This presumes that em-dash does not appear in the caption text anywhere. If it does, then you'll need to do a pre- and post- "fiddle" to ensure these em-dashes are not touched by the global replace above.
The "pre-fiddle" is to do a global find/replace across captions, replacing the em-dash ("^+") with some other string (e.g. "EM-DASH") that doesn't ever occur in any caption's text. Then you do the separator change as described above. Finally, the "post-fiddle" is to restore the em-dashes that were in the captions, by doing a global replace of the string "EM-DASH" with the actual em-dash character "^+".