for loop in fish shell only goes a maximum of 3 times - fish

I'm using fish shell to do a simple for loop. For some reason it only iterates three times instead of a 100 (or whatever I put in there). What am I doing wrong?
error ➜ for i in seq 1 100
echo hi
end
hi
hi
hi
error ➜
Note that running seq 1 100 will -as expected- output numbers from 1 to 100.
Here's my fish version:
error ➜ fish --version
fish, version 3.0.2

That's because you're not launching the seq command.
You are doing this
for i in "seq" "1" "100"
when you want to do this
for i in (seq 1 100)
# .......^.........^ command substitution

hmmm so the answer to my question is very simple.
I should have (seq 1 100) instead of seq 1 100.
It needs to be enclosed in brackets to have the seq command evaluated. Instead, what I just gave fish is an array of three strings ("seq" "1" "100") who happened to look like a command lol.
So yeah I am not a very intelligent person.

Related

How to append more than 33 files in a gcloud bucket?

I use to append datasets in a bucket in gcloud using:
gsutil compose gs://bucket/obj1 [gs://bucket/obj2 ...] gs://bucket/composite
However, today when I tried to append some data the terminal prints the error CommandException: The compose command accepts at most 33 arguments.
I didn't know about this restriction. How can I append more than 33 files in my bucket? Is there another command line tool? I would like to avoid to create a virtual machine for what looks like a rather simple task.
I checked the help using gsutil help compose. But it didn't help much. There is only a warning saying "Note that there is a limit (currently 32) to the number of components that can
be composed in a single operation." but no hint on a workaround.
Could you not do it recursively|batch?
I've not tried this.
Given an arbitrary list of files (FILES)
While there is more than 1 file in FILES:
Take the first n where n<=33 from FILES and gsutil compose into temp file
If that succeeds, replace the n names in FILES with the 1 temp file.
Repeat
The file that remains is everything composed.
Update
The question piqued my curiosity and gave me an opportunity to improve my bash ;-)
A rough-and-ready proof-of-concept bash script that generates batches of gsutil compose commands for arbitrary (limited by the string formatting %04) numbers of files.
GSUTIL="gsutil compose"
BATCH_SIZE="32"
# These may be the same (or no) bucket
SRC="gs://bucket01/"
DST="gs://bucket02/"
# Generate test LST
FILES=()
for N in $(seq -f "%04g" 1 100); do
FILES+=("${SRC}/file-${N}")
done
function squish() {
LST=("$#")
LEN=${#LST[#]}
if [ "${LEN}" -le "1" ]; then
# Empty array; nothing to do
return 1
fi
# Only unique for this configuration; be careful
COMPOSITE=$(printf "${DST}/composite-%04d" ${LEN})
if [ "${LEN}" -le "${BATCH_SIZE}" ]; then
# Batch can be composed with one command
echo "${GSUTIL} ${LST[#]} ${COMPOSITE}"
return 1
fi
# Compose 1st batch of files
# NB Provide start:size
echo "${GSUTIL} ${LST[#]:0:${BATCH_SIZE}} ${COMPOSITE}"
# Remove batch from LST
# NB Provide start (to end is implied)
REM=${LST[#]:${BATCH_SIZE}}
# Prepend composite from above batch to the next run
NXT=(${COMPOSITE} ${REM[#]})
squish "${NXT[#]}"
}
squish "${FILES[#]}"
Running with BATCH_SIZE=3, no buckets and 12 files yields:
gsutil compose file-0001 file-0002 file-0003 composite-0012
gsutil compose composite-0012 file-0004 file-0005 composite-0010
gsutil compose composite-0010 file-0006 file-0007 composite-0008
gsutil compose composite-0008 file-0008 file-0009 composite-0006
gsutil compose composite-0006 file-0010 file-0011 composite-0004
gsutil compose composite-0004 file-0012 composite-0002
NOTE How composite-0012 is created by the first command but then knitted into the subsequent command.
I'll leave it to you to improve throughput by not threading the output from each step into the next, parallelizing the gsutil compose commands across the list chopped into batches and then compose the batches.
The docs say that you may only combine 32 components in a single operation, but there is no limit to the number of components that can make up a composite object.
So, if you have more than 32 objects to concatenate, you may perform multiple compose operations, composing 32 objects at a time until you eventually get all of them composed together.

Sorting and removing duplicates from single or multiple large files

I have a 70gb file with 400 million+ lines (JSON). My end goal is to remove duplicates lines so i have a fully "de-duped" version of the file. I am doing this on a machine with 8cores and 64gb ram.
I am also expanding on this thread, 'how to sort out duplicates from a massive list'.
Things i have tried:
Neek - javascript quickly runs out of memory
Using Awk (doesn't seem to work for this)
using Perl (perl -ne 'print unless $dup{$_}++;') - again, runs out of memory
sort -u largefile > targetfile
does not seem to work. I think the file is too large.
Current approach:
Split the files into chunks of 5million lines each.
Sort/Uniq each of the files
for X in *; do sort -u --parallel=6 $X > sorted/s-$X; done
Now I have 80 individually sorted files. I am trying to re-merge/de-dupe them them using sort -m. This seems to do nothing as the file/line size ends up being the same.
Since sort -m does not seem to work, i am currently trying this:
cat *.json | sort > big-sorted.json
then I will try to run uniq with
uniq big-sorted.json > unique-sorted.json
Based on past experience, I do not believe this will work.
What is the best approach here? How do i re-merge the files and remove any duplicate lines at this point.
Update 1
As I suspected, cat * | sort > bigfile did not work. It just copied everything to a single file the way it was previously sorted (in individual files).
Update 2:
I also tried the following code:
cat *.json | sort --parallel=6 -m > big-sorted.json
The result was the same as the previous update.
I am fresh out of ideas.
Thanks!
After some trial and error, i found the solution:
sort -us -o out.json infile.json

grep after matching pattern but exclude period

I have the following sample strings:
Program exiting with code: 0.
Program exiting with code: 1.
Program exiting with code: 10.
I want grep to return values after the matching pattern "Program exiting with code:". However I do not need the period at the end. I have tried:
grep "Program exiting with exit code:" dataload.log | sed 's/^.*.: //'
The above command returns:
0.
1.
10.
I want to ignore the period at end. I picked up the above command from somewhere.
Can someone describe what each keyword represents and provide me with a regex that will only provide me with the value without period?
sed, awk, perl or any other way is fine with me.
Just use grep with a look-behind and catch only digits:
$ grep -Po '(?<=Program exiting with code: )\d*' file
0
1
10
sed 's/^.*.: //'
First of all this is a substition regular expression, as denoted by the s at the start. The part in the first / / is what to match and the second / / is what to replace it with.
The characters in the first (match) part of the expression are all special regular expression characters.
A full list of what they mean can be found here: http://www.rexegg.com/regex-quickstart.html
The match means:
^ - At the start of the line
. - Match any character
* - Any number of times
. - Match any character
: - Match a colon
- Match a space
And then that is all replaced with nothing. That is why the period at the end is kept because it removes everything up to Program exiting with code:, which leaves 1.
It's worth playing around with an interactive tool to test different regular expressions on your string, e.g. http://www.regexr.com/
You can probably just substitute/remove everything that is not a number in your case: 's/[^0-9]//g'.
This is another way:
grep -oE 'Program exiting with code:\s*[0-9]+' dataload.log |grep -oE '[0-9]+$'
Output of the first grep command is:
Program exiting with code: 0
Program exiting with code: 1
Program exiting with code: 10
Then you just grep the last digits.
Your solution is just fine if you extend it with a cut command:
grep "Program exiting with code:" dataload.log | sed 's/^.*.: //' | cut -d"." -f1

ksh error remove from list

I am trying to remove a certain element from a list in korn shell. It's working on my linux machine but the exact same code gives me an error on a solaris11 machine. I need a code that will work for both. It's probably because of different ksh versions but I would like to find a solution that works for both.
The code is:
#!/bin/ksh
MY_LIST="HELLO HOW ARE YOU"
toDel="HOW"
MY_LIST=( "${MY_LIST[#]/$toDel}" )
echo "MY LIST AFTER REMOVING HOW IS $MY_LIST"
On Solaris I get the following error:
syntax error at line 4 : '(' unexpected
Any suggestions?
Melodie wrote: Finally, I used 'Walter A' solution
Nice I could help.
Enabling you to vote for me and close the question, I post my comment as an answer.
MY_LIST=`echo $MY_LIST | sed "s/$toDel//"`
You'll probably need to spend some time with the ksh88 man page.
Without futher explanation:
set -A my_list HELLO HOW ARE YOU # note, separate words
toDel=HOW
set -- # using positional parameters as "temp array"
for word in "${my_list[#]}"; do
[[ $word != $toDel ]] && set -- "$#" "$word"
done
set -A my_list "$#"
printf "%s\n" "${my_list[#]}"
HELLO
ARE
YOU
Finally, I used 'Walter A' solution:
MY_LIST=`echo $MY_LIST | sed "s/$toDel//"`

grep command to print follow-up lines after a match

how to use "grep" command to find a match and to print followup of 10 lines from the match. this i need to get some error statements from log files. (else need to download use match for log time and then copy the content). Instead of downloading bulk size files i need to run a command to get those number of lines.
A default install of Solaris 10 or 11 will have the /usr/sfw/bin file tree. Gnu grep - /usr/sfw/bin/ggrep is there. ggrep supports /usr/sfw/bin/ggrep -A 10 [pattern] [file] which does what you want.
Solaris 9 and older may not have it. Or your system may not have been a default install. Check.
Suppose, you have a file /etc/passwd and want to filter user "chetan"
Please try below command:
cat /etc/passwd | /usr/sfw/bin/ggrep -A 2 'chetan'
It will print the line with letter "chetan" and the next two lines as well.
-- Tested in Solaris 10 --