How to extract and import a subsection of an OrientDB graph

How to extract and import a subsection of an OrientDB graph - orientdb

What I want to do
I would like to be able to run a query like
traverse * from Location while $depth < 5 limit 100
to then import the resulting vertices and edges into a new database with the same schema as the first.
What I've achieved
I have a way of copying the schema by running
database export TestDB -includeClusterDefinitions=false -includeSecurity=false -includeRecords=false -includeIndexDefinitions=false -includeManualIndexes=false
and then importing it into a new database.
Ideas
I've looked at OETL but can't figure out how to get it to do what I need.

So my approach to do this instead was to delete everything in the DB that wasn't in my traverse query.
delete vertex V where #rid not in (
select #rid from (
traverse * from Location while $depth < 5 limit 100
)
)
This meant what was left was the schema I wanted and subset of the DB I wanted to export.
So a full export
export database TestDB
gave me what I wanted.

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?

I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

Why is TABLE not POPULATING in V$IM_SEGMENTS even after scanning?

So I have added a table to inmemory and have scanned the table after that. But it is still not appearing in V$IM_SEGMENTS. In EXPLAIN PLAN it is showing INMEMORY ACCESS FULL. So not sure if it is using the column store.
Did these:
ALTER TABLE INMEMORY;
SELECT * FROM ;
SELECT * FROM V$IM_SEGMENTS;
no rows

To start with inmemory_size should be around 100M.
Following command should show appropriate size value for parameter inmemory_size:
show parameter inmemory_size
Loading of table segments into inmemory area kicks when there is a full scan on the table or inmemory priority clause is other than none, so we need to be sure the select query you had done went through table access full path.
So, one more way to initiate full table scan is to do select count(*) from table.
Or you can use populate procedure from dbms_inmemory package to load the table manually into inmemory area.
Example usage (for user inmem_user, table t1):
exec dbms_inmemory.populate('INMEM_USER','T1');
One more thing to consider here with respect to querying v$im_segments is; bytes_not_populated and populate_status columns also to be queried for correctness.
When v$im_segments returns rows, bytes_not_populated should be 0 and populate_status should be COMPLETED.
More information about inmemory population can be foune here

Hive Table is empty?

What is the fastest way to check if a table has any records in Hive?
I so far have come across these approaches:
Do a SELECT count(*) FROM <table_name>, I find this to be slow.
Do a show tblproperties <db.table_name>("numRows");, I find that these give -1 if ANALYZE TABLE isn't run on table before. Hence would require ANALYZE TABLE .. to be run before SHOW TBLPROPERTIES ..
Do a SELECT * FROM <table_name> limit 1. I find this to be the most efficient way.
Are there better ways to do this?
(I just want to check if Hive table has at least one record)

This is as far as I know:
Hive table is partitioned:
1) find the location of the table
desc formatted <tablename>
2) compute the file size in hdfs
hdfs dfs -du -h <location of table>
Hive is not partitioned:
1) show tblproperties <db.table_name>
2) find numRows

Get all tables from all databases by a query

I want to get a list of all the tables in all the DBs like: db_name, table_name, card
I tried sysibm.tables, syscat.tables & sysibm.systables but they all relevant to the current DB I'm connecting to..
what basically I'm looking for is equivalent to DBA_TABLES / CDB_TABLES in oracle..

How to list tables from accessible via database links?

I have an access to a database, and sure I can get all tables/columns accessible for me just using:
select * from ALL_TAB_COLUMNS
I can also access some tables using "#", as I understand a database link mechanism, like this:
select * from aaa.bbb_ddd#ffgh where jj = 55688
where aaa.bbb_ddd#ffgh corresponds to some table with a column jj
BUT I don't see this aaa.bbb_ddd#ffgh table in ALL_TAB_COLUMNS.
How can I request all tables (and columns inside them) accessible for me via these database links (or so)?

You can't, easily, get all columns accessible via all database links; you can get all columns accessible via one database link by querying ALL_TAB_COLUMNS on the remote database
select * from all_tab_columns#<remote_server>
where <remote_server> in your example would be ffgh.
If you want to get this same information for all database links in your current schema, you'd either have to manually enumerate them and UNION the results together:
select * from all_tab_columns#dblink1
union all
select * from all_tab_columns#dblink2
Or, do something dynamically.
As Justin says, it's clearer if you add which database the data is coming from; you can do this either by just writing it in the query:
select 'dblink1' as dblink, a.* from all_tab_columns#dblink1 a
union all
select 'dblink2', a.* from all_tab_columns#dblink2 a
Or by using an Oracle built-in to work, for example the GLOBAL_NAME table (there's lots more ways):
select db1g.global_name, db1a.*
from all_tab_columns#dblink1 db1a
cross join global_name#dblink1 db1g
union all
select db2g.global_name, db2a.*
from all_tab_columns#dblink2 db2a
cross join global_name#dblink2 db2g