Import Amazon txt dataset into Neo4j

Import Amazon txt dataset into Neo4j - import

I am fairly new to Neo4j and until now just loaded some csv files.
Now I am trying to load the Amazon "Product co-purchasing network" dataset from:
https://snap.stanford.edu/data/#amazon
More precisely this one:
https://snap.stanford.edu/data/amazon-meta.html
I am wondering what is the correct way to load this file?
The file is a simple text file, where products are separated by an empty new line. So it looks like this:
Id: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group: Book
salesrank: 396585
similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X
categories: 2
|Books[283155]|Subjects[1000]|Religion &
Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]
reviews: total: 2 downloaded: 2 avg rating: 5
2000-7-28 cutomer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9
2003-12-14 cutomer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5
Id: 2
ASIN: 0738700797
...
So far I tried to load it like a normal txt file
LOAD CSV FROM "file:///data/amazon-meta.txt" AS line
Return line
Skip 2
limit 10
The code returns the data in the following format:
["Id: 1"]
["ASIN: 0827229534"]
[" title: Patterns of Preaching: A Sermon Sampler"]
[" group: Book"]
[" salesrank: 396585"]
[" similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X"]
[" categories: 2"]
[" |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]"]
[" |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]"]
[" reviews: total: 2 downloaded: 2 avg rating: 5"]
[" 2000-7-28 cutomer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9"]
[" 2003-12-14 cutomer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5"]
["Id: 2"]
["ASIN: 0738700797"]
So my next thought was to just merge all the lines from "id" to the next "id" together, but I am not sure if this is even possible or a good solution, since it seems quite complicated.
What would be a good way to load this dataset?

Related

Final Mat_Lab Project

I am working on a final Mat_Lab project and while I have the code written I am having an issue where it won't graph an imported data.
the first two parts are as follow:
%parte 1 abre el archivo:
Data = readtable('Proyecto_Final.xlsx');
opts = 'skip Ln1';
%parte 2 graficas lineales:
hold on;
figure(1);
x=Data(:, 1);
y=Data(:, 2:11);
plot(x,y,'m*:');
xlabel('Time(s)');
ylabel("Day 1", "Day 2", "Day 3", "Day 4", "Day 5", "Day 6", "Day 7", "Day 8", "Day 9", "Day 10");
hold off;
The data in question comes from an Excell file and the following mistakes come through:
Error using tabular/plot (line 217)
Tables and timetables do not have a plot method. To plot a table or a timetable, use the stackedplot function. As an
alternative, extract table or timetable variables using dot or brace subscripting, and then pass the variables as input
arguments to the plot function.
Error in ProyectoFinal (line 17)
plot(x,y,'m*:');
Which I did added under the ylabel as stackedplot (x,y);

You can't access a column of a MATLAB table by the command Data(:, 1), rather you should use Data.Var1 where Var1 is the name of the column.
Using the following test.txt file an MWE (minimum workable example) is provided. Edit this as per your table:
test.txt
1 12 12 47
2 24 19 32
4 45 48 31
5 54 12 27
6 68 95 56
7 82 45 56
8 94 36 56
9 102 12 24
MWE:
Data = readtable('test.txt');
hold on;
figure(1);
x = Data.Var1
y = [Data.Var2, Data.Var3, Data.Var4];
plot(x,y,'m*:');
xlabel('Time(s)');
ylabel("Day data")
legend("Day 1", "Day 2", "Day 3");
hold off;
Output plot:
Note: I think with the command \ylabel, you are actually trying to produce legends. See the corresponding documentation (ylabel and legend).

Solved the issues pertaining the project, I stayed all night up until 6am going back and forth a Discord server called matlab, project ran well and was successfully submitted. Thanks for all the help, here is how it was solved:
Thanks again for your help and pointers, may these images help you.

Error generating chart: Computation timed out showing in Google Earth Engine

I'm trying to generate a chart of my NDVI values, but I'm getting this error:
Error generating chart: Computation timed out.
This is my code..any help-advice would be really helpful!!
It's important to mention that my AOI file contains more than 500 polygons, that have been converted to a multipolygon..I don't know if this is a problem..
Thanks a lot!
\\Select the ndvi from an Image Collection
var selectNDVI_2020 = getNDVI_2020.select(['NDVI']);
\\Plots NDVI
var plot_index= ui.Chart.image.seriesByRegion(selectNDVI_2020, aoi, ee.Reducer.median(),'NDVI',5000,'system:time_start','system:index')
.setChartType('LineChart').setOptions({
title: 'Δείκτης NDVI για ' + metavliti +' για το έτος 2020',
hAxis: {title: 'Date'},
vAxis: {title: 'NDVI'},
viewWindow : {max : 1, min : 0},
colors: ['blue'],
curveType:'function',
pointSize: 4
});
print(plot_index,'NDVI_2020');

Maybe the AOI is too large for computing.
I change my parameter "scale" from 30 to 300, and find that it was okay.

How to comment on a specific line number on a PR on github

I am trying to write a small script that can comment on github PRs using eslint output.
The problem is eslint gives me the absolute line numbers for each error.
But github API wants the line number relative to the diff.
From the github API docs: https://developer.github.com/v3/pulls/comments/#create-a-comment
To comment on a specific line in a file, you will need to first
determine the position in the diff. GitHub offers a
application/vnd.github.v3.diff media type which you can use in a
preceding request to view the pull request's diff. The diff needs to
be interpreted to translate from the line in the file to a position in
the diff. The position value is the number of lines down from the
first "##" hunk header in the file you would like to comment on.
The line just below the "##" line is position 1, the next line is
position 2, and so on. The position in the file's diff continues to
increase through lines of whitespace and additional hunks until a new
file is reached.
So if I want to add a comment on new line number 5 in the above image, then I would need to pass 12 to the API
My question is how can I easily map between the new line numbers which the eslint will give in it's error messages to the relative line numbers required by the github API
What I have tried so far
I am using parse-diff to convert the diff provided by github API into json object
[{
"chunks": [{
"content": "## -,OLD_TOTAL_LINES +NEW_STARTING_LINE_NUMBER,NEW_TOTAL_LINES ##",
"changes": [
{
"type": STRING("normal"|"add"|"del"),
"normal": BOOLEAN,
"add": BOOLEAN,
"del": BOOLEAN,
"ln1": OLD_LINE_NUMBER,
"ln2": NEW_LINE_NUMBER,
"content": STRING,
"oldStart": NUMBER,
"oldLines": NUMBER,
"newStart": NUMBER,
"newLines": NUMBER
}
}]
}]
I am thinking of the following algorithm
make an array of new line numbers starting from NEW_STARTING_LINE_NUMBER to
NEW_STARTING_LINE_NUMBER+NEW_TOTAL_LINESfor each file
subtract newStart from each number and make it another array relativeLineNumbers
traverse through the array and for each deleted line (type==='del') increment the corresponding remaining relativeLineNumbers
for another hunk (line having ##) decrement the corresponding remaining relativeLineNumbers

I have found a solution. I didn't put it here because it involves simple looping and nothing special. But anyway answering now to help others.
I have opened a pull request to create the similar situation as shown in question
https://github.com/harryi3t/5134/pull/7/files
Using the Github API one can get the diff data.
diff --git a/test.js b/test.js
index 2aa9a08..066fc99 100644
--- a/test.js
+++ b/test.js
## -2,14 +2,7 ##
var hello = require('./hello.js');
-var names = [
- 'harry',
- 'barry',
- 'garry',
- 'harry',
- 'barry',
- 'marry',
-];
+var names = ['harry', 'barry', 'garry', 'harry', 'barry', 'marry'];
var names2 = [
'harry',
## -23,9 +16,7 ## var names2 = [
// after this line new chunk will be created
var names3 = [
'harry',
- 'barry',
- 'garry',
'harry',
'barry',
- 'marry',
+ 'marry', 'garry',
];
Now just pass this data to diff-parse module and do the computation.
var parsedFiles = parseDiff(data); // diff output
parsedFiles.forEach(
function (file) {
var relativeLine = 0;
file.chunks.forEach(
function (chunk, index) {
if (index !== 0) // relative line number should increment for each chunk
relativeLine++; // except the first one (see rel-line 16 in the image)
chunk.changes.forEach(
function (change) {
relativeLine++;
console.log(
change.type,
change.ln1 ? change.ln1 : '-',
change.ln2 ? change.ln2 : '-',
change.ln ? change.ln : '-',
relativeLine
);
}
);
}
);
}
);
This would print
type (ln1) old line (ln2) new line (ln) added/deleted line relative line
normal 2 2 - 1
normal 3 3 - 2
normal 4 4 - 3
del - - 5 4
del - - 6 5
del - - 7 6
del - - 8 7
del - - 9 8
del - - 10 9
del - - 11 10
del - - 12 11
add - - 5 12
normal 13 6 - 13
normal 14 7 - 14
normal 15 8 - 15
normal 23 16 - 17
normal 24 17 - 18
normal 25 18 - 19
del - - 26 20
del - - 27 21
normal 28 19 - 22
normal 29 20 - 23
del - - 30 24
add - - 21 25
normal 31 22 - 26
Now you can use the relative line number to post a comment using github api.
For my purpose I only needed the relative line numbers for the newly added lines, but using the table above one can get it for deleted lines also.
Here's the link for the linting project in which I used this. https://github.com/harryi3t/lint-github-pr

$pull operation in MongoDB not working for me

I have a document with the following key-array pair:
"home" : [
"Kevin Garnett",
"Paul Pierce",
"Rajon Rondo",
"Brandon Bass",
" 5 sec inbound",
"Kevin Seraphin"
]
I want to remove the element " 5 sec inbound" from the array and use the following command (in the MongoDB shell):
>coll.update({},{"$pull":{"home":" 5 sec inbound"}})
This is not working as verified by a query:
>coll.findOne({"home":/5 sec inbound/})
"home" : [
"Kevin Garnett",
"Paul Pierce",
"Rajon Rondo",
"Brandon Bass",
" 5 sec inbound",
"Kevin Seraphin"
]
Any help would be greatly appreciated!

That very same statement works for me:
> db.test.insert({"home" : [
... "Kevin Garnett",
... "Paul Pierce",
... "Rajon Rondo",
... "Brandon Bass",
... " 5 sec inbound",
... "Kevin Seraphin"
... ]})
> db.test.find({"home":/5 sec inbound/}).count()
1
> db.test.update({},{"$pull":{"home":" 5 sec inbound"}})
> db.test.find({"home":/5 sec inbound/}).count()
0

How do I get mongoimport to work with complex json data?

Trying to use the built in mongoimport utility with mongo db...
I might be blind but is there a way to import complex json data? For instance, say I need to import instances of the following object: { "bob": 1, "dog": [ 1, 2, 3 ], "beau": { "won": "ton", "lose": 3 } }.
I'm trying the following and it looks like it loads everything into memory but nothing actually gets imported into the db:
$ mongoimport -d test -c testdata -vvvv -file ~/Downloads/jsondata.json
connected to: 127.0.0.1
Tue Aug 10 17:38:38 ns: test.testdata
Tue Aug 10 17:38:38 filesize: 69
Tue Aug 10 17:38:38 got line:{ "bob": 1, "dog": [ 1, 2, 3 ], "beau": { "won": "ton", "lose": 3 } }
imported 0 objects
Any ideas on how to get the json data to actually import into the db?

I did some testing and it looks like you need to have an end-of-line character at the end of the file. Without the end-of-line character the last line is read, but isn't imported.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Import Amazon txt dataset into Neo4j - import

Related

Final Mat_Lab Project

Error generating chart: Computation timed out showing in Google Earth Engine

How to comment on a specific line number on a PR on github

$pull operation in MongoDB not working for me

How do I get mongoimport to work with complex json data?

Categories

Resources