I just received a large data array of random numbers. 20 numbers per line, 600,000 lines in a CSV file. The numbers are separated by a space instead of a comma thus postgresql reads it as one long string per line and I cannot insert the proper data attribute for the data.
Each set of numbers will have a unique id. Each number is 2 digits long. I want the ability to count the amount of times a certain number was entered. Get the frequency of each number between certain ID's.
My question:
What data type should I use to insert the data so it is recognized as integers instead of text?
How do I replace the space with a comma?
Do I need to replace the space with a comma?
Currently running Postgres 9.6, PgAdmin 4.
Bonus if answer is provided in PgAdmin as well.
Also here is a sample
Excel
Numbers
06 18 20 21 24 32 36 40 44 47 50 52 55 57 60 61 62 68 72 79
03 05 12 13 14 16 17 18 24 28 33 34 35 39 44 55 62 63 64 67
09 10 12 13 15 25 30 31 36 42 43 44 46 48 51 57 65 69 75 79
08 12 15 20 27 33 34 37 41 43 44 45 54 55 60 61 66 70 72 76
CSV FILE
Numbers06 18 20 21 24 32 36 40 44 47 50 52 55 57 60 61 62 68 72 79
03 05 12 13 14 16 17 18 24 28 33 34 35 39 44 55 62 63 64 67
09 10 12 13 15 25 30 31 36 42 43 44 46 48 51 57 65 69 75 79
08 12 15 20 27 33 34 37 41 43 44 45 54 55 60 61 66 70 72 76
or the file with the id numbers
CSV
ID, Numbers
1253842,06 18 20 21 24 32 36 40 44 47 50 52 55 57 60 61 62 68 72 79
1253843,03 05 12 13 14 16 17 18 24 28 33 34 35 39 44 55 62 63 64 67
1253844,09 10 12 13 15 25 30 31 36 42 43 44 46 48 51 57 65 69 75 79
1253845,08 12 15 20 27 33 34 37 41 43 44 45 54 55 60 61 66 70 72 76
1253846,04 06 07 09 11 12 13 14 18 20 21 26 30 36 37 41 43 48 74 79
1253847,01 11 14 15 35 37 38 43 46 48 49 51 53 57 64 65 66 70 76 77
1253848,01 03 14 17 20 22 24 25 38 42 46 54 56 57 60 61 66 72 78 80
Here's the error message
>
ERROR: malformed array literal: "06 18 20 21 24 32 36 40 44 47 50 52 55 57 60 61 62 68 72 79"
DETAIL: Array value must start with "{" or dimension information.
CONTEXT: COPY Quick numbers, line 2, column numbers : "06 18 20 21 24 32 36 40 >44 47 50 52 55 57 60 61 62 68 72 79"
Related
I have a matrix A of size 2500 x 500. I want to sum each 10 columns and get the result as a matrix B of size 2500 x 50. That is, the first column of B is the sum of the first 10 columns of A, the second column of B is the sum of second 10 columns of A, and so on.
How can I do that without a for loop? Since I have to do that hundreds of times and it is highly time consuming to do that using for loop.
First, we "block reshape" A, such that we have the desired number of columns. Therefore, we shamelessly steal the code from the great Divakar, and put in some minimal effort to generalize it. Then, we just need to sum along the second axis, and reshape to the original form.
Here's an example with five columns to be summed:
% Sample input data
A = reshape(1:100, 10, 10).'
[r, c] = size(A);
% Number of columns to be summed
n_cols = 5;
% Block reshape to n_cols, see https://stackoverflow.com/a/40508999/11089932
B = reshape(permute(reshape(A, r, n_cols, []), [1, 3, 2]), [], n_cols);
% Sum along second axis
B = sum(B, 2);
% Reshape to original form
B = reshape(B, r, c / n_cols)
That's the output:
A =
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70
71 72 73 74 75 76 77 78 79 80
81 82 83 84 85 86 87 88 89 90
91 92 93 94 95 96 97 98 99 100
B =
15 40
65 90
115 140
165 190
215 240
265 290
315 340
365 390
415 440
465 490
Hope that helps!
This can be done with splitapply. An advantage of this approach is that it works even if the group size does not divide the number of columns (the last group is smaller):
A = reshape(1:120, 12, 10).'; % example 10×12 data (borrowed from HansHirse)
n_cols = 5; % number of columns to sum over
result = splitapply(#(x)sum(x,2), A, ceil((1:size(A,2))/n_cols));
In this example,
A =
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 57 58 59 60
61 62 63 64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94 95 96
97 98 99 100 101 102 103 104 105 106 107 108
109 110 111 112 113 114 115 116 117 118 119 120
result =
15 40 23
75 100 47
135 160 71
195 220 95
255 280 119
315 340 143
375 400 167
435 460 191
495 520 215
555 580 239
I have written this code in q for solving Euler 18 problem,as described in the link below, using recursion.
https://stackoverflow.com/questions/8002252/euler-project-18-approach
Though the code works, it is not efficient and gets stack overflow at pyramids of sizes greater than 3000. How could I make this code much more efficient.I believe the optimum code can be less than 30 characters.
pyr:{[x]
lsize:count x;
y:x;
$[lsize <=1;y[0];
[.ds.lastone:x[lsize - 1];
.ds.lasttwo:x[lsize - 2];
y:{{max (.ds.lasttwo)[x] +/: .ds.lastone[x],.ds.lastone[x+1]}each til count .ds.lasttwo};
$[(count .ds.lasttwo)=1;y:{max (.ds.lasttwo) +/: .ds.lastone[x],.ds.lastone[x+1]}0;y:y[]];
x[lsize - 2]:y;
pyr[-1_x]]]
}
To properly implement this logic in q you need to use adverbs.
First, to quickly find the rolling maximums you can use the prior adverb. For example:
q)input:(75;95 64;17 47 82;18 35 87 10;20 04 82 47 65;19 01 23 75 03 34;88 02 77 73 07 63 67;99 65 04 28 06 16 70 92;41 41 26 56 83 40 80 70 33;41 48 72 33 47 32 37 16 94 29;53 71 44 65 25 43 91 52 97 51 14;70 11 33 28 77 73 17 78 39 68 17 57;91 71 52 38 17 14 91 43 58 50 27 29 48;63 66 04 68 89 53 67 30 73 16 69 87 40 31;04 62 98 27 23 09 70 98 73 93 38 53 60 04 23)
q)last input
4 62 98 27 23 9 70 98 73 93 38 53 60 4 23
q)1_(|) prior last input
62 98 98 27 23 70 98 98 93 93 53 60 60 23
That last line outputs the a vector with the maximum value between each successive pair in the input vector. Once you have this you can add it to the next row and repeat.
q)foo:{y+1_(|) prior x}
q)foo[input 14;input 13]
125 164 102 95 112 123 165 128 166 109 122 147 100 54
Then, to apply this function over the whole use the over adverb:
q)foo over reverse input
,1074
EDIT: The approach above can be generalized further.
q provides a moving max function mmax. With this you can find "the x-item moving maximum of numeric y", which generalizes the use of prior above. For example, you can use this to find the moving maximum of pairs or triplets in the last row of the input:
q)last input
4 62 98 27 23 9 70 98 73 93 38 53 60 4 23
q)2 mmax last input
4 62 98 98 27 23 70 98 98 93 93 53 60 60 23
q)3 mmax last input
4 62 98 98 98 27 70 98 98 98 93 93 60 60 60
mmax can be used to simplify foo above:
q)foo:{y+1_ 2 mmax x}
What's especially nice about this is that it can be used to generalize to variants of this problem with wider triangles. For example, the triangle below has two more values on each row and from any point on a row you can move to the left, middle, or right of the row below it.
5
5 6 7
6 7 3 9 1
I would like to reshape a vector into a number 'slices' (in Matlab) but find myself in a brain freeze and can't come up with a good way (e.g. a one-liner) to do it:
a=1:119;
slices=[47 24 1 47];
result={1:47,48:71,...};
the result doesn't need to be stored in a cell array.
Thanks
This is what mat2cell does:
>> a=1:119;
>> slices=[47 24 1 47];
>> result = mat2cell(a, 1, slices) % 1 is # of rows in result
result =
{
[1,1] =
Columns 1 through 15:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Columns 16 through 30:
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Columns 31 through 45:
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
Columns 46 and 47:
46 47
[1,2] =
Columns 1 through 15:
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
Columns 16 through 24:
63 64 65 66 67 68 69 70 71
[1,3] = 72
[1,4] =
Columns 1 through 13:
73 74 75 76 77 78 79 80 81 82 83 84 85
Columns 14 through 26:
86 87 88 89 90 91 92 93 94 95 96 97 98
Columns 27 through 39:
99 100 101 102 103 104 105 106 107 108 109 110 111
Columns 40 through 47:
112 113 114 115 116 117 118 119
}
I am having hard time understanding this syntax,
val grid = {
val input = """ 08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
|49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
|81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
|52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
|22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
|24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
|32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
|67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
|24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
|21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
|78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
|16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
|86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
|19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
|04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
|88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
|04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
|20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
|20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
|01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48 """
.stripMargin
val rows = input.split("\n").map(_.trim)
rows.map(_.split(" ").map(_.toInt))
}
Here grid is of type Array[Array[Int]]> I understand that we are creating a 2D array based on some logic inside {} of grid. But what is this val grid = {}, and how can we do calculations inside of it.
In Scala, everything is an expression, and you can combine any sequence of expressions at basically any position in the program.
So here, we assign something to grid, that something is the content of the {} block expression. You can have arbitrary sequences of expressions in a block expression, so in your example we start by definining two val, mostly to make the code easier to read.
Then the last expression, which perform a map on the previously defined rows is the expression returned by the block. That is the value that will be assigned to grid.
A good reason to do such a thing here, is that both input and rows are only visible in the block where they are defined. That means they will not be polluting the scope where you use grid. It's actually very good style.
As method declarations (def) are allowed to have a Block Expression defined to them, so does a Value Declaration.
If we look into the specification (6.11) under "Block", we can see the definition for a valid block declaration:
A block expression { s1; ……; sn; e} is constructed from a sequence
of block statements s1,…,sn and a final expression e. The
statement sequence may not contain two definitions or declarations
that bind the same name in the same namespace. The final expression
can be omitted, in which case the unit value () is assumed.
The expected type of the final expression e is the expected type of
the block. The expected type of all preceding statements is undefined.
And then goes to specify the definition for a value block declaration:
A locally defined value definition val x: T = e is bound by the
existential clause val x: T
Evaluation of the block entails evaluation of its statement sequence,
followed by an evaluation of the final expression e, which defines
the result of the block.
This shows that block expression is valid for member declaration as well as for methods. This is particularly useful when you have a block of code which is relevant only to the initialization of the member, allowing you to create a more complex initialization sequence.
I'm brushing off my MATLAB skills, which haven't been used it a very long time. And to do so I've been doing the puzzles over at project eueler. Well, I'm kind of stumped on this one as everything seems to run fine when I break into parts but my greatest number is apparently not correct.
Anyway here's my code
%This script will take a grid and find the greatest product of 4 numbers
%up, down, left right, and diagonal
%create the grid
grid = [08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08;
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00;
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65;
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91;
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80;
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50;
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70;
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21;
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72;
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95;
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92;
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57;
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58;
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40;
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66;
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69;
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36;
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16;
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54;
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48;];
%grid(row, column)
%find how many rows and columns there are
[rowNum, columnNum] = size(grid);
%Current greatest product of four consecutive
greatest = 0;
%test left right
%iterate through all rows
for i = 1:rowNum
%iterate through all columns except the last tthree
for x = 1:columnNum-3
%test the consecutive numbers starting with the current colum
%iteration
current = 1;
for test = x:x+3
current = current * grid(i, test);
if greatest < current
greatest = current;
end
end
end
end
%iterate through all columns
for i = 1:columnNum
%iterate throw all rows except the last three
for x = 1:rowNum-3
%test the consecutive numbers starting with the current colum
%iteration
current = 1;
for test = x:x+3
current = current * grid(test, i);
if greatest < current
greatest = current;
end
end
end
end
for i = 1:columnNum-3
%iterate throw all column except the last two
for x = 1:rowNum-3
%test the consecutive numbers starting with the current colum
%iteration
current = 1;
%count adds the number of iterations to the row number in grid
%forcing it to move in a diagonal line
count = 0;
%test consecutively from x to three to the right
for test = x:x+3
%make new current product
current = current * grid(i+count, test);
%add to count to shift down one row next iteration
count = count + 1;
%check for greatest
if greatest < current
greatest = current;
end
end
end
end
disp(greatest)
I think the problem lies in the nested for loops searching the diagonal, but I'm not sure.
I've broken it up into pieces (test each for loop on its own and it it output each element as it searched through them). I'm pretty sure the horizontal and vertical search method works correctly.
Just wanted a fresh set of eyes to look at this, Thanks!
Also I'm sure there is a more efficient way to do this so if anyone has some ideas for a better algorithm I'd love to hear it!