Data seems to have loaded but nothing retrieved with Cypher queries - import

New to Neo4j and this post, forgive me if I have not followed established conventions or norms reporting the following issue.
I loaded some data into the Neo4j database (v2.0.1) via a text file using the recommended approach from this link.
Everything seems to have loaded OK (see snippet from console below) but when I try to query the database, nothing is returned. From the Data Browser I enter the Cypher command
Match (n) Return n
Nothing is returned.
The Neo4j Dashboard shows over 500 nodes, 3000 properties, and 900 relationships. The REST API also shows the nodes, relationships, and attributes (click on the bubbles icon in the upper left). Clicking on any of these nodes, relationships or attributes will trigger a Cypher query, but nothing will be returned.
The same MATCH (n) Return n query in the REST API also returns no nodes.
Is the data loading correctly? What am I doing wrong? I suspect there is some little trick I am just not getting. Any assistance would be greatly appreciated.
I did run a few of the Cypher queries one at a time in the Data Browser (and REST API), and the nodes were created correctly, and I could query them. So I don't think there is an issue with the Cypher query itself (though I could be wrong).
******* Console feedback during load process **********
mac-pro:neo4j$ cat import.txt | bin/neo4j-shell -config conf/neo4j.properties -path data/graph.db
NOTE: Local Neo4j graph database service at 'data/graph.db'
Welcome to the Neo4j Shell! Enter 'help' for a list of commands
neo4j-sh (?)$ BEGIN
Transaction started
neo4j-sh (?)$ CREATE (n:Employee {ID:0, Name:'X', CompanyID:'211051', JobTitle:'SPECIALIST CONTRACTOR'…<other properties omitted for brevity>...});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 13
Labels added: 1
1906 ms
neo4j-sh (?)$ CREATE (n:Employee {ID:2, Name:'Y', CompanyID:'211036', JobTitle:'PROGRM. MNGR LEVEL CONTRACTOR', …<other properties omitted for brevity>...});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 13
Labels added: 1
.
.
.
<rest of the lines omitted>
.
.
.

Related

pq: [parent] Data too large

I'm using Grafana to visualize some data stored in CrateDB in different panes.
Some of my boards work correctly, but there are 3 specific boards (created by someone from my work team), in which at certain times of the day they stop showing data (No Data) and as a warning it shows the following error:
db query error: pq: [parent] Data too large, data for [fetch-1] would be [512323840/488.5mb], which is larger than the limit of [510027366/486.3mb], usages [request=0/0b, in_flight_requests=0/0b, query=150023700/143mb, jobs_log=19146608/18.2mb, operations_log=10503056/10mb]
Honestly, I would like to understand what it means, and how I can fix it.
I remain attentive to any help you can give me, and I deeply appreciate the help.
what I tried
17 SQL Statements of the form:
SELECT
time_index AS "time",
entity_id AS metric,
v1_ps
FROM etsm
WHERE
entity_id = 'SM_B3_RECT'
ORDER BY 1,2
for 17 different entities.
what I hope
I hope to receive the data corresponding to each of the SQL statements for their respective graphing.
The result
As a result, there is no data received on some of the statements made and the warning message I shared:
db query error: pq: [parent] Data too large, data for [fetch-1] would be [512323840/488.5mb], which is larger than the limit of [510027366/486.3mb], usages [request=0/0b, in_flight_requests=0/0b, query=150023700/143mb, jobs_log=19146608/18.2mb, operations_log=10503056/10mb]
As an additional fact, the graph is configured to update every 15 min, but no matter how many times you manually update the graph, the statements that receive data are different.
Example: I refresh the panel and the SQL statements A, B and C get data, while the others don't. I refresh the panel and the SQL statements D, H and J receive data, and the others don't (with a random pattern).
Other additional information:
I have access to the database being consulted with Grafana, and the data is there
You don't have time condition, so query select/process all records all the time and you are hitting limits (e. g. size of processed data) of your DB. Add time condition, so only fraction of all records will be returned.

Executing the query using bq command line in Google Big Query

I execute a query using the below Python script and the table gets populated with 2,564,691 rows. When I run the same query using Google Big Query console, it returns 17,379,353 rows (query is as-is). I was wondering whether there is some issue with the below script. Not sure whether --replace in bq query replaces the past result set instead of appending to it.
Any help would be appreciated.
dateToday = (time.strftime("%Y/%m/%d"))
dateToday1 = dateToday.replace('/','')
commandStr = "type C:\Users\query.txt | bq query --allow_large_results --replace --destination_table table:dataset1_%s -n 1" %(dateToday1)
In the Web UI you can use Query History option to navigate to respective queries.
After you locate them - you can expand respective entries and see what exactly query was executed
I am more than sure that just comparing query texts you will see source of "discrepancy" right away!
added
In Query History - not only you can see Query Text, but also all configuration properties that were used for respective query - like Write Preference for example and others. So even if query text the same you can see potential difference in configuration that will give you a clue

How to delete data from an RDBMS using Talend ELT jobs?

What is the best way to delete from a table using Talend?
I'm currently using a tELTJDBCoutput with the action on Delete.
It looks like Talend always generate a DELETE ... WHERE EXISTS (<your generated query>) query.
So I am wondering if we have to use the field values or just put a fixed value of 1 (even in only one field) in the tELTmap mapping.
To me, putting real values looks like it useless as in the where exists it only matters the Where clause.
Is there a better way to delete using ELT components?
My current job is set up like so:
The tELTMAP component with real data values looks like:
But I can also do the same thing with the following configuration:
Am I missing the reason why we should put something in the fields?
The following answer is a demonstration of how to perform deletes using ETL operations where the data is extracted from the database, read in to memory, transformed and then fed back into the database. After clarification, the OP specifically wants information around how this would differ for ELT operations
If you need to delete certain records from a table then you can use the normal database output components.
In the following example, the use case is to take some updated database and check to see which records are no longer in the new data set compared to the old data set and then delete the relevant rows in the old data set. This might be used for refreshing data from one live system to a non live system or some other usage case where you need to manually move data deltas from one database to another.
We set up our job like so:
Which has two tMySqlConnection components that connect to two different databases (potentially on different hosts), one containing our new data set and one containing our old data set.
We then select the relevant data from the old data set and inner join it using a tMap against the new data set, capturing any rejects from the inner join (rows that exist in the old data set but not in the new data set):
We are only interested in the key for the output as we will delete with a WHERE query on this unique key. Notice as well that the key has been selected for the id field. This needs to be done for updates and deletes.
And then we simply need to tell Talend to delete these rows from the relevant table by configuring our tMySqlOutput component properly:
Alternatively you can simply specify some constraint that would be used to delete the records as if you had built the DELETE statement manually. This can then be fed in as the key via a main link to your tMySqlOutput component.
For instance I might want to read in a CSV with a list of email addresses, first names and last names of people who are opting out of being contacted and then make all of these fields a key and connect this to the tMySqlOutput and Talend will generate a DELETE for every row that matches the email address, first name and last name of the records in the database.
In the first example shown in your question:
you are specifically only selecting (for the deletion) products where the SOME_TABLE.CODE_COUNTRY is equal to JS_OPP.CODE_COUNTRY and SOME_TABLE.FK_USER is equal to JS_OPP.FK_USER in your where clause and then the data you send to the delete statement is setting the CODE_COUNTRY equal to JS_OPP.CODE_COUNTRY and FK_USER equal to JS_OPP.CODE_COUNTRY.
If you were to put a tLogRow (or some other output) directly after your tELTxMap you would be presented with something that looks like:
.----------+---------.
| tLogRow_1 |
|=-----------+------=|
|CODE_COUNTRY|FK_USER|
|=-----------+------=|
|GBR |1 |
|GBR |2 |
|USA |3 |
'------------+-------'
In your second example:
You are setting CODE_COUNTRY to an integer of 1 (your database will then translate this to a VARCHAR "1"). This would then mean the output from the component would instead look like:
.------------.
|tLogRow_1 |
|=-----------|
|CODE_COUNTRY|
|=-----------|
|1 |
|1 |
|1 |
'------------'
In your use case this would mean that the deletion should only delete the rows where the CODE_COUNTRY is equal to "1".
You might want to test this a bit further though because the ELT components are sometimes a little less straightforward than they seem to be.

Neo4j: MERGE creates duplicate nodes

My database model has users and MAC addresses. A user can have multiple MAC addresses, but a MAC can only belong to one user. If some user sets his MAC and that MAC is already linked to another user, the existing relationship is removed and a new relationship is created between the new owner and that MAC. In other words, a MAC moves between users.
This is a particular instance of the Cypher query I'm using to assign MAC addresses:
MATCH (new:User { Id: 2 })
MERGE (mac:MacAddress { Value: "D857EFEF1CF6" })
WITH new, mac
OPTIONAL MATCH ()-[oldr:MAC_ADDRESS]->(mac)
DELETE oldr
MERGE (new)-[:MAC_ADDRESS]->(mac)
The query runs fine in my tests, but in production, for some strange reason it sometimes creates duplicate MacAddress nodes (and a new relationship between the user and each of those nodes). That is, a particular user can have multiple MacAddress nodes with the same Value.
I can tell they are different nodes because they have different node ID's. I'm also sure the Values are exactly the same because I can do a collect(distinct mac.Value) on them and the result is a collection with one element. The query above is the only one in the code that creates MacAddress nodes.
I'm using Neo4j 2.1.2. What's going on here?
Thanks,
Jan
Are you sure this is the entirety of the queries you're running? MERGE has this really common pitfall where it merges everything that you give it. So here's what people expect:
neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" });
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 1
Labels added: 1
1650 ms
neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" });
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
17 ms
neo4j-sh (?)$ match (mac:MacAddress { Value: "D857EFEF1CF6" }) return count(mac);
+------------+
| count(mac) |
+------------+
| 1 |
+------------+
1 row
200 ms
So far, so good. That's what we expect. Now watch this:
neo4j-sh (?)$ MERGE (mac:MacAddress { Value: "D857EFEF1CF6" })-[r:foo]->(b:SomeNode {label: "Foo!"});
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 2
Relationships created: 1
Properties set: 2
Labels added: 2
178 ms
neo4j-sh (?)$ match (mac:MacAddress { Value: "D857EFEF1CF6" }) return count(mac);
+------------+
| count(mac) |
+------------+
| 2 |
+------------+
1 row
2 ms
Wait, WTF happened here? We specified only the same MAC address again, why is a duplicate created?
The documentation on MERGE specifies that "MERGE will not partially use existing patterns — it’s all or nothing. If partial matches are needed, this can be accomplished by splitting a pattern up into multiple MERGE clauses". So because when we run this path MERGE the whole path doesn't already exist, it creates everything in it, including a duplicate mac address node.
There are frequently questions about duplicated nodes created by MERGE, and 99 times out of 100, this is what's going on.
This is the response I got back from Neo4j's support (emphasis mine):
I got some feedback from our team already, and it's currently known that this can happen in the absence of a constraint. MERGE is effectively MATCH or CREATE - and those two steps are run independently within the transaction. Given concurrent execution, and the "read committed" isolation level, there's a race condition between the two.
The team have done some discussion on how to provided a higher guarantee in the face of concurrency, and do have it noted as a feature request for consideration.
Meanwhile, they've assured me that using a constraint will provide the uniqueness you're looking for.

Understanding MON$STAT_ID in the Firebird monitoring tables

I posted a few weeks back inquiring about the firebird DB and how to monitor it. Since then I have come up with a nifty script that monitors all of the page reads/writes/fetches/marks. One of the columns I am monitoring is the MON$STAT_ID and the MON$STAT_GROUP fields. This prints out a nice number for me; however, I have no way to correlate and understand what exactly it is. I thought printing out the MON$STAT_GROUP would help but it has yet to assist me in any way...
I have also looked into the RDB$ commands but have found very limited documentation to see if they might assist me in monitoring my database.
So I decided to come here and inquire first off whether I am monitoring my database in a way that others can view the data from page reads/writes/fetches/marks and make an intelligent decision on whether or not the database is performing as expected.
Secondly, would adding RDB$ commands to my script add anything to the value of the data that I will be giving our database folks?
Lastly, and maybe most importantly, is there anyway to correlate the MON$STAT_ID fields to an actual table in the database to understand when something is going on that should not be? I currently am monitoring the database every minute which may be to frequent, but I am getting valid data out. The only question now is how to interpret this data. Can someone give me advice on methods they use/have used in the past that have worked for them?
(NOTE: Running firebird 2.1)
The column MON$STAT_ID in MON$IO_STATS (and MON$RECORD_STATS and MON$MEMORY_USAGE) is the primary key of the record in the monitoring table. Almost all other monitoring tables include a MON$STAT_ID to point to these statistics: MON$ATTACHMENTS, MON$CALL_STACK, MON$DATABASE, MON$STATEMENTS, MON$TRANSACTIONS.
In other words: the statistics apply on the database, attachment, transaction, statement or call level (PSQL executes). The statistics tables contain a column called MON$STAT_GROUP to discern these types. The values of MON$STAT_GROUP are described in RDB$TYPES:
0 : DATABASE
1 : ATTACHMENT
2 : TRANSACTION
3 : STATEMENT
4 : CALL
Typically the statistics of level 0 contain all from level 1, level 1 contains all from level 2 for that attachment, level 2 contains all from level 3 for that transaction, level 3 contains all from level 4 for that statement.
As there might be data processed unrelated to the lower level, or a specific attachment, transaction or statement handle has already been dropped, the numbers of the lower level do not necessarily aggregate to the entire number of the higher level.
There is no way to correlate the statistics to a specific table (as this information isn't table related, but - simplified - from executing statements which might cover multiple tables).
As I also commented, I am unsure what you mean with "RDB$ commands". But I am assuming you are talking about RDB$GET_CONTEXT() and RDB$SET_CONTEXT(). You could use RDB$GET_CONTEXT() to obtain the current connection (SESSION_ID) and transaction id (TRANSACTION_ID). These values values can be used for MON$ATTACHMENT_ID and MON$TRANSACTION_ID in the monitoring tables. I don't think the other variables in the SYSTEM namespace are interesting, and those in USER_SESSION and USER_TRANSACTION are all user-defined (and initially those namespaces are empty).
It is far easier to use the CURRENT_CONNECTION and CURRENT_TRANSACTION context variables within a statement. As documented in doc\README.monitoring_tables.txt in the Firebird installation:
System variables CURRENT_CONNECTION and CURRENT_TRANSACTION could be used to select data about the current (for the caller) connection and transaction respectively. These variables correspond to the ID columns of the appropriate monitoring tables.
Note: my answer is based on Firebird 2.5.
To present statistics by specific tables I use this SQL (FB 3)
select t.mon$table_name,trim(
case when r.mon$record_seq_reads>0 then 'Non index Reads: '||r.mon$record_seq_reads else '' end||
case when r.mon$record_idx_reads>0 then ' Index Reads: '||r.mon$record_idx_reads else '' end||
case when r.mon$record_inserts>0 then ' Inserts: '||r.mon$record_inserts else '' end||
case when r.mon$record_updates>0 then ' Updates: '||r.mon$record_updates else '' end||
case when r.mon$record_deletes>0 then ' Deletes: '||r.mon$record_deletes else '' end)
from MON$TABLE_STATS t
join mon$record_stats r on r.mon$stat_id=t.mon$record_stat_id
where t.mon$table_name not starting 'RDB$' and r.mon$stat_group=2
order by 1