Cypher query equivalent for neo4j-import - import

I'm trying to create a about 27 millions relationships along with 15 million nodes, initially I was using Cypher, but it was taking a lot of time, so I switched neo4j-import tool utility.
I have confusion whether the result of cypher query is same as that of neo4j-import.
My Cypher query was:
load csv from "file://dataframe6.txt" as line fieldterminator" "
MERGE (A :concept{name:line[0]})
WITH line, A
MERGE (B :concept{name:line[1]})
WITH B,A
MERGE (A)-[:test]->(B);
Content in dataframe6 :
C0000005,C0036775,RB_
C0000039,C0000039,SY_sort_version_of
C0000039,C0000039,SY_entry_version_of
C0000039,C0000039,SY_permuted_term_of
C0000039,C0001555,AQ_
C0000039,C0001688,AQ_
My neo4j-import script:
neo4j-import --into graph.db --nodes:concept "nheader,MRREL-nodes" --relationships "rheader,MRREL-relations" --skip-duplicate-node true
rheader : :START_ID,:END_ID,:TYPE
nheader : :ID,name
MRREL-nodes :
C0000005,C0000005
C0000039,C0000039
C0000052,C0000052
C0036775,C0036775
C0001555,C0001555
MRREL-relations
C0000005,C0036775,RB_
C0000039,C0000039,SY_sort_version_of
C0000039,C0000039,SY_entry_version_of
C0000039,C0000039,SY_permuted_term_of
C0000039,C0001555,AQ_
C0000039,C0001688,AQ_
Somehow I don't see same result

[EDITED]
If you want your relationships to have dynamically assigned types, then you need to change your Cypher code to make use of line[2] to specify the relationship type (e.g., via the APOC procedure apoc.create.relationship). It is currently always using test as the type.
If, instead, you actually wanted all the relationships imported by neo4j-import to have the same test type, then you need to use the right syntax.
Try removing ",:TYPE" from rheader, and use this import command line ( --relationships has been changed to --relationships:test):
neo4j-import --into graph.db --nodes:concept "nheader,MRREL-nodes" --relationships:test "rheader,MRREL-relations" --skip-duplicate-node true

Related

Access locally scoped variables from within a string using parse or value (KDB / Q)

The following lines of Q code all throw an error, because when the statement "local" is parsed, the local variable is not in the correct scope.
{local:1; value "local"}[]
{[local]; value "local"}[1]
{local:1; eval parse "local"}[]
{[local]; eval parse "local"}[1]
Is there a way to reach the local variable from inside the parsed string?
Note: This is a simplification of the actual problem I'm grappling with, which is to write a function that executes a query, accepting a list of columns which it should return. I imagine the finished product looking something like this:
getData:{[requiredColumns, condition]
value "select ",(", " sv string[requiredColumns])," from myTable where someCol=condition"
}
The condition parameter in this query is the one that isn’t recognised and I do realise I could append it’s value rather than reference it inside a string, but the real query uses lots of local variables including tables etc, so it’s not as easy as just pulling all the variables out of the string before calling value on it.
I'm new to KDB and Q, so if anyone has a better way to achieve the same effect I'm happy to be schooled on the proper way to achieve this outcome in Q. Would still be interested to know in the variable access thing is possible though.
In the first example, you are right that local is not within the correct scope, as value is looking for the global variable local.
One way to get around this is to use a namespace, which will define the variable globally, but can only be accessed by calling that namespace. In the modified example below I have defined local in the .ns namespace
{.ns.local:1; value ".ns.local"}[]
For the problem you are facing with selecting, if requiredColumns is a symbol list of columns you can just use the take operator # to select them.
getData:{[requiredColumns] requiredColumns#myTable}
For more advanced queries using variables you may have to use functional select form, explained here. This will allow you to include variables in the where and by clause of the select statement
The same example in functional form would be (no by clause, only select and where):
getData:{[requiredColumns;condition] requiredColumns:(), requiredColumns;
?[myTable;enlist (=;`someCol;condition);0b;requiredColumns!requiredColumns]}
The first line ensures that requiredColumns is a list even if the user enters a single column name
value will look for a variable in the global scope that's why you are getting an error. You can directly use local variables like you are doing that in your function.
Your function is mostly correct, just need a slight correction to append condition(I have mentioned that below). However, a better approach would be to use functional select in this case.
Using functional select:
q) t:([]id:`a`b; val:3 4)
q) gd: {?[`t;enlist (=;`val;y);0b;((),x)!(),x]}
q) gd[`id;3] / for single column
Output:
id
-
1
q) gd[`id`val;3] / for multiple columns
In case your condition column is of type symbol, then enlist your condition value like:
q) gd: {?[`t;enlist (=;`id;y);0b;((),x)!(),x]}
q) gd[`id;enlist `a]
You can use parse to get a functional form of qsql queries:
q) parse " select id,val from t where id=`a"
?
`t
,,(=;`id;,`a)
0b
`id`val!`id`val
Using String concat(your function):
q)getData:{[requiredColumns;condition] value "select ",(", " sv string[requiredColumns])," from t where id=", .Q.s1 condition}
q) getData[enlist `id;`a] / for single column
q) getData[`id`val;`a] / for multi columns

How to insert similar value into multiple locations of a psycopg2 query statement using dict? [duplicate]

I have a Python script that runs a pgSQL file through SQLAlchemy's connection.execute function. Here's the block of code in Python:
results = pg_conn.execute(sql_cmd, beg_date = datetime.date(2015,4,1), end_date = datetime.date(2015,4,30))
And here's one of the areas where the variable gets inputted in my SQL:
WHERE
( dv.date >= %(beg_date)s AND
dv.date <= %(end_date)s)
When I run this, I get a cryptic python error:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) argument formats can't be mixed
…followed by a huge dump of the offending SQL query. I've run this exact code with the same variable convention before. Why isn't it working this time?
I encountered a similar issue as Nikhil. I have a query with LIKE clauses which worked until I modified it to include a bind variable, at which point I received the following error:
DatabaseError: Execution failed on sql '...': argument formats can't be mixed
The solution is not to give up on the LIKE clause. That would be pretty crazy if psycopg2 simply didn't permit LIKE clauses. Rather, we can escape the literal % with %%. For example, the following query:
SELECT *
FROM people
WHERE start_date > %(beg_date)s
AND name LIKE 'John%';
would need to be modified to:
SELECT *
FROM people
WHERE start_date > %(beg_date)s
AND name LIKE 'John%%';
More details in the pscopg2 docs: http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
As it turned out, I had used a SQL LIKE operator in the new SQL query, and the % operand was messing with Python's escaping capability. For instance:
dv.device LIKE 'iPhone%' or
dv.device LIKE '%Phone'
Another answer offered a way to un-escape and re-escape, which I felt would add unnecessary complexity to otherwise simple code. Instead, I used pgSQL's ability to handle regex to modify the SQL query itself. This changed the above portion of the query to:
dv.device ~ E'iPhone.*' or
dv.device ~ E'.*Phone$'
So for others: you may need to change your LIKE operators to regex '~' to get it to work. Just remember that it'll be WAY slower for large queries. (More info here.)
For me it's turn out I have % in sql comment
/* Any future change in the testing size will not require
a change here... even if we do a 100% test
*/
This works fine:
/* Any future change in the testing size will not require
a change here... even if we do a 100pct test
*/

how to use for each loop in Progress?

Basically i'm trying to do a simple join. I'm a beginner in progress and even if i'm reading always the same things... my problem still unresolved ! :'(
I'm using unixodbc to communicate with my base and this is working like a charm when i'm using simple command like : SELECT * from PUB."Art"
I understood I have to do something who looks like that to join 2 tables :
FOR EACH PUB."Art" WHERE (PUB."Art".IdArt = 16969) ,
EACH PUB."ArtDet" WHERE (PUB."ArtDet".IdArt = PUB."Art".IdArt)
END
But this only return me [ISQL]ERROR: Could not SQLPrepare
I then try to simplify the thing with :
for each PUB."Art": display PUB."Art".IdArt end.
I try to put colon (or not) after the for each loop, using point / comma etc... but I never use the right syntax apparently... or I'm missing a thing to execute this command !
Is anyone can advice me ?
Thx a lot !
You appear to mixing SQL and 4GL syntax.
"FOR EACH" is 4GL. The SQL equivalent is "SELECT".
(If you are using 4GL you do not need then "PUB" prefix and quoting table and field names will not work.)
To do a join with SQL (or the 4GL) use a "," between the table names. For SQL your syntax would look something like:
SELECT * from PUB."Art", PUB."ArtDet"
Gory details regarding WHERE clauses, SQL INNER & OUTER joins etc. can be found in the online documentation:
https://community.progress.com/community_groups/openedge_general/w/openedgegeneral/1329.openedge-product-documentation-overview
You will want to navigate to your specific release and then find the "SQL" guide.

Assigning a whole DataStructure its nullind array

Some context before the question.
Imagine file FileA having around 50 fields of different types. Instead of all programs using the file, I tried having a service program, so the file could only be accessed by that service program. The programs calling the service would then receive a DataStructure based on the file structure, as an ExtName. I use SQL to recover the information, so, basically, the procedure would go like this :
Datastructure shared by service program :
D FileADS E DS ExtName(FileA) Qualified
Procedure called by programs :
P getFileADS B Export
D PI N
D PI_IDKey 9B 0 Const
D PO_DS LikeDS(FileADS)
D LocalDS E DS ExtName(FileA) Qualified
D NullInd S 5i 0 Array(50) <-- Since 50 fields in fileA
//Code
Clear LocalDS;
Clear PO_DS;
exec sql
SELECT *
INTO :LocalDS :nullind
FROM FileA
WHERE FileA.ID = :PI_IDKey;
If SqlCod <> 0;
Return *Off;
EndIf;
PO_DS = LocalDS;
Return *On;
P getFileADS E
So, that procedure will return a datastructure filled with a record from FileA if it finds it.
Now my question : Is there any way I can assign the %nullind(field) = *On without specifying EACH 50 fields of my file?
Something like a loop
i = 1;
DoW (i <= 50);
if nullind(i) = -1;
%nullind(datastructure.field) = *On;
endif;
i++;
EndDo;
Cause let's face it, it'd be a pain to look each fields of each file every time.
I know a simple chain(n) could do the trick
chain(n) PI_IDKey FileA FileADS;
but I really was looking to do it with SQL.
Thank you for your advices!
OS Version : 7.1
First, you'll be better off in the long run by eliminating SELECT * and supplying a SELECT list of the 50 field names.
Next, consider these two web pages -- Meaningful Names for Null Indicators and Embedded SQL and null indicators. The first shows an example of assigning names to each null indicator to match the associated field names. It's just a matter of declaring a based DS with names, based on the address of your null indicator array. The second points out how a null indicator array can be larger than needed, so future database changes won't affect results. (Bear in mind that the page shows a null array of 1000 elements, and the memory is actually relatively tiny even at that size. You can declare it smaller if you think it's necessary for some reason.)
You're creating a proc that you'll only write once. It's not worth saving the effort of listing the 50 fields. Maybe if you had many programs using this proc and you had to create the list each time it'd be a slight help to use SELECT *, but even then it's not a great idea.
A matching template DS for the 50 data fields can be defined in the /COPY member that will hold the proc prototype. The template DS will be available in any program that brings the proc prototype in. Any program that needs to call the proc can simply specify LIKEDS referencing the template to define its version in memory. The template DS should probably include the QUALIFIED keyword, and programs would then use their own DS names as the qualifying prefix. The null indicator array can be handled similarly.
However, it's not completely clear what your actual question is. You show an example loop and ask if it'll work, but you don't say if you had a problem with it. It's an array, so a loop can be used much like you show. But it depends on what you're actually trying to accomplish with it.
for old school rpg just include the nulls in the data structure populated with the select statement.
select col1, ifnull(col1), col2, ifnull(col2), etc. into :dsfilewithnull where f.id = :id;
for old school rpg that can't handle nulls remove them with the select statement.
select coalesce(col1,0), coalesce(col2,' '), coalesce(col3, :lowdate) into :dsfile where f.id = :id;
The second method would be easier to use in a legacy environment.
pass the key by value to the procedure so you can use it like a built in function.
One answer to your question would be to make the array part of a data structure, and assign *all'0' to the data structure.
dcl-ds nullIndDs;
nullInd Ind Dim(50);
end-ds;
nullIndDs = *all'0';
The answer by jmarkmurphy is an example of assigning all zeros to an array of indicators. For the example that you show in your question, you can do it this way:
D NullInd S 5i 0 dim(50)
/free
NullInd(*) = 1 ;
Nullind(*) = 0 ;
*inlr = *on ;
return ;
/end-free
That's a complete program that you can compile and test. Run it in debug and stop at the first statement. Display NullInd to see the initial value of its elements. Step through the first statement and display it again to see how the elements changed. Step through the next statement to see how things changed again.
As for "how to do it in SQL", that part doesn't make sense. SQL sets the values automatically when you FETCH a row. Other than that, the array is used by the host language (RPG in this case) to communicate values back to SQL. When a SQL statement runs, it again automatically uses whatever values were set. So, it either is used automatically by SQL for input or output, or is set by your host language statements. There is nothing useful that you can do 'in SQL' with that array.

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]