Best way to index large file - mongodb

I have a file with about 100gb with a word:tag per line. I want to index these on word to easily get the list of tags for a given word.
I wanted to save this on boltdb (mainly to checkout boltdb) but random write access is bad so I was aiming to index the file in some other way first, then moving all of it to boltdb without need to check for duplicates or de/serialisation of the tag list
So, for reference, if I simply read the file into memory (discarding data), I get about 8 MB/s.
If I write to boltdb files using code such as
line := ""
linesRead := 0
for scanner.Scan() {
line = scanner.Text()
linesRead += 1
data := strings.Split(line, ":")
err = bucket.Put([]byte(data[0]), []byte(data[1]))
logger.FatalErr(err)
// commit on every N lines
if linesRead % 10000 == 0 {
err = tx.Commit()
logger.FatalErr(err)
tx, err = db.Begin(true)
logger.FatalErr(err)
bucket = tx.Bucket(name)
}
}
I get about 300 Kb/s speed and this is not even complete (as it's not adding tag to each word, only stores the last occurrence). So adding the array and JSON serialisation would definitely lower that speed...
So I gave mongodb a try
index := mgo.Index{
Key: []string{"word"},
Unique: true,
DropDups: false,
Background: true,
Sparse: true,
}
err = c.EnsureIndex(index)
logger.FatalErr(err)
line := ""
linesRead := 0
bulk := c.Bulk()
for scanner.Scan() {
line = scanner.Text()
data := strings.Split(line, ":")
bulk.Upsert(bson.M{"word": data[0]}, bson.M{"$push": bson.M{"tags": data[1]}})
linesRead += 1
if linesRead % 10000 == 0 {
_, err = bulk.Run()
logger.FatalErr(err)
bulk = c.Bulk()
}
}
And I get about 300 Kb/s as well (though Upsert and $push here handle appending to the list).
I tried with a local MySQL instance as well (indexed on word) but speed was like 30x slower...

Related

flowing lights in structured text

I am very new to structured text, so pardon my simple question.
I am using OpenPLC to create this simple program. I have been following the example from the link below to create flowing lights simple program with structured text. In this video, they used 5LEDs and controlled it with case statements.
However, my question is, if my program needs to turn on 100 lights, how should I change the code?
Should I use for loops? How?
https://www.youtube.com/watch?v=PXnaULHpxC8&t=25s
Yes you can use for loops etc. to make the program more "dynamic".
Unfortunately most of the PLC's don't give you dynamic access to their digital outputs. This means that at the end you will have to write code that will translate the value from array (which you will be looping through) into digital outputs.
There are a few ways to do that. First let me show how you can create chasing light for up to 16.
PROGRAM PLC_PRG
VAR
iNumOfLights : INT := 6;
fbCounter : CTU := ;
fbTicker : BLINK := (ENABLE := TRUE, TIMELOW := T#100MS, TIMEHIGH := T#1S);
wOut: WORD;
END_VAR
fbTicker();
fbCounter(CU := fbTicker.OUT, RESET := fbCounter.Q, PV := iNumOfLights);
wOut := SHL(2#0000_0000_0000_0001, fbCounter.CV);
A := wOut.0;
B := wOut.1;
C := wOut.2;
D := wOut.3;
E := wOut.4;
F := wOut.5;
G := wOut.6;
END_PROGRAM
Or if you know output address you can do it directly to outputs.
PROGRAM PLC_PRG
VAR
iNumOfLights : INT := 6;
fbCounter : CTU := ;
fbTicker : BLINK := (ENABLE := TRUE, TIMELOW := T#100MS, TIMEHIGH := T#1S);
wOut AT %QB0.1: WORD;
END_VAR
fbTicker();
fbCounter(CU := fbTicker.OUT, RESET := fbCounter.Q, PV := iNumOfLights);
wOut := SHL(2#0000_0000_0000_0001, fbCounter.CV);
END_PROGRAM
You can also change type of chasing lights by something like.
IF fbCounter.CV = 0 THEN
wOut := 0;
END_IF;
wOut := wOut OR SHL(2#0000_0000_0000_0001, fbCounter.CV);
Now what is behind this. SHl operator will move 1 to the left on set number. For example SHL(2#0000_0000_0000_0001, 3) will result in 2#0000_0000_0000_1000. So we assign it to wOut and then access individual bits by wOut.[n].

Autohotkey Matrix (AHK)

At the moment I am working on AI in AHK.
Now I have the problem that I don't know how to deal with a matrix. See below an example matrix:
WeightLooper := 1
Loop %NumberOfWeightsLayerTotal%
{
Random, Weight_%WeightLooper%, -1.0, 1.0
WeightLooper := WeightLooper + 1
}
WEIGHTS_1 := Array([Weight_1, Weight_2, Weight_3, Weight_4], [Weight_5, Weight_6, Weight_7, Weight_8], [Weight_9, Weight_10, Weight_11, Weight_12])
TRAINING_INPUTS := []
rows := (LastFilledY - 1)
columns := (LastFilledX - 1)
Xas := 0
Yas := 0
Loop, % rows
{
Xas := 0
Yas := Yas + 1
row := []
Loop, % columns
{
Xas := Xas + 1
row.push(myarray[Yas][Xas])
}
TRAINING_INPUTS.push(row)
}
Now I have a matrix of 3x4. Suppose I want a matrix of 10x10, how do I do that? So basically I want to create a variable matrix.
I ask this because my input (csv file) can vary from 2x2 to 1000000x1000000.
I'd probably recommend pushing a new array into the array in a loop:
WEIGHTS_1 := []
rows := 5
columns := 7
Loop, % rows
{
row := []
Loop, % columns
{
Random, weight, -1.0, 1.0
row.push(weight)
}
WEIGHTS_1.push(row)
}
Example output:
[[-0.678368, -0.768605, -0.274922, 0.049760, -0.133968, -0.876030, -0.235799]
,[-0.296078, 0.359816, -0.461632, 0.788800, -0.707147, -0.200223, -0.473914]
,[0.474090, 0.085090, 0.458321, -0.820574, 0.145089, 0.193249, 0.990545]
,[0.205461, 0.901953, -0.137901, 0.279726, 0.562361, -0.019861, -0.887540]
,[0.504811, -0.876628, -0.127397, 0.156817, 0.873983, 0.859992, -0.879222]]

AHK array of file names and modified time

I'm looking to create an array of file names and their modified time. I can build the arrays separately. But how can I build this in a way to be like
[ [file1, modtime1], [file2, modtime2], ...]
Here is the script that builds each individual array.
modTime := []
filenames := []
counter := 1
Full_Path := "C:\Users\me\MyDocs\*.txt"
Loop, %
{
modTime[counter]:=A_LoopFileTimeModified
filenames[counter]:=A_LoopFileFullPath
counter++
}
loop % modTime.MaxIndex()
items.= modTime[A_Index] ","
StringLeft, items, items, Strlen(items)-1
msgbox % items
loop % filenames.MaxIndex()
items.= filenames[A_Index] ","
StringLeft, items, items, Strlen(items)-1
msgbox % items
return
modTime := []
counter := 1
Full_Path = % "C:\Users\me\MyDocs\*.txt"
Loop, % Full_Path
{
SplitPath, % A_LoopFileFullPath, file_name
modTime.Push([file_name, A_LoopFileTimeModified])
counter++
}
For each, element in modTime
items .= element[1] ", " element[2] "`n"
MsgBox, % RTrim(items, "`n")

How to copy massive data to clipboard

I have an array [1, 2, 56, 32, 54] or something.
How do i send it to clipboard
1
2
56
32
54
I tried this.
Loop % table.MaxIndex() {
meow := table[A_Index]
Clipboard := meow"`r"
}
table := [1,2,3,4,5,6,7,8,9]
vClipboard := ClipboardAll
Clipboard := ""
Loop, % table.Count()
string := string ? string . ", " . table[A_Index] : table[A_Index]
Clipboard := string
;Do some stuff.
Clipboard := vClipboard ;Restore the Clipboard.
Your problem was trying to loop with table.MaxIndex() which will potentially give you an unexpected amount since you can have an array with 2 values but very far apart in terms of Index i.e.
table := [1]
table[93] := "String"
and also each loop was overwriting your meow value. The method you want to use is concatenate i.e.
meow := "Hello"
meow := meow . "World" or meow .= "World"

Maple: RNG is not random

i was "finding Pi" with Monte Carlo Method, but the answer was incorrect. The oryginal code was:
RandomTools[MersenneTwister]: with(Statistics):
tries := 10000:
s := 0;
for i to tries do
if GenerateFloat()^2+GenerateFloat()^2 < 1 then s := s+1 end if;
end do:
evalf(4*s/tries)
It gives answer aroud 2.8-2.85
when I change the code to
s := 0;
x := Array([seq(GenerateFloat(), i = 1 .. tries)]);
y := Array([seq(GenerateFloat(), i = 1 .. tries)]);
for i to tries do
if x[i]^2+y[i]^2 < 1 then s := s+1 end if;
end do:
evalf(4*s/tries)
Then the answer is correct. I have no idea why i can't generate number in "for" loop.
I've founded that the mean of it is the same, but the variance is different.
For:
tries := 100000;
A := Array([seq(GenerateFloat(), i = 1 .. 2*tries)]);
s1 := Array([seq(A[i]^2+A[tries+i]^2, i = 1 .. tries)]);
Mean(s1);
Variance(s1);
s2 := Array([seq(GenerateFloat()^2+GenerateFloat()^2, i = 1 .. tries)]);
Mean(s2);
Variance(s2);
output is:
0.6702112097021581
0.17845439723457215
0.664707674135025
0.35463131700965245
What's wrong with it? GenerateFloat() should be as uniform as possible.
Automatic simplification is turning your,
GenerateFloat()^2+GenerateFloat()^2
into,
2*GenerateFloat()^2
before GenerateFloat() is evaluated.
One simple change to get it to work as you expected would be separate them. Eg,
restart:
with(RandomTools[MersenneTwister]):
tries := 10^4:
s := 0:
for i to tries do
t1,t2 := GenerateFloat(),GenerateFloat();
if t1^2+t2^2 < 1 then s := s+1 end if;
end do:
evalf(4*s/tries);
Another way is to use a slightly different construction which doesn't automatically simplify. Consider, single right quotes (uneval quotes) don't stop automatic simplification (which is a definition of the term if you want).
'f()^2 + f()^2';
2
2 f()
But the following does not automatically simplify,
a:=1:
'f()^2 + a*f()^2';
2 2
f() + a f()
Therefore another easy workaround is,
restart:
with(RandomTools[MersenneTwister]):
tries := 10^4:
s := 0:
a := 1;
for i to tries do
if GenerateFloat()^2 + a*GenerateFloat()^2 < 1 then s := s+1 end if;
end do:
evalf(4*s/tries);