What data type does YCSB load into a database? - ycsb

I am loading data to Cassandra through YCSB using this command -
bin/ycsb load cassandra-10 -p hosts="132.67.105.254" -P workloads/workloada > workloada_res.txt
I just want what "sort of data" is loaded using above command. I mean a single character or a string.

Have you tried to run the same command with the basic switch instead of the cassandra-10 one?
From the documentation:
The basic parameter tells the Client to use the dummy BasicDB layer. [...]
If you used BasicDB, you would see the insert statements for the database [in the terminal]. If you used a real DB interface layer, the records would be loaded into the database.
You will then notice that YCSB generates rowkeys like user###, with ### < recordcount. The columns are named field0 to fieldN with N = fieldcount - 1 and the content of each cell is a random string of fieldlength characters.
recordcount is given in workloada and equals 1000.
fieldcount defaults to 10 and fieldlength to 100 but you can overwrite both in you workload file or using the -p switch.

Related

Trying to move Boolean column logic from SQLITE to Postgres

I am trying to move my Python application from my PC (which is using SQLITE3) to Heroku (Postgres). I have declared Boolean columns in my tables and, in SQLITE I need to give these a value of 0 or 1. However, I believe in Postgres this needs to be True or False.
Is there a way that this conversion can be done easily such that I can continue to develop on SQLITE and deploy to Postgres without changing the code.
I guess the bottom line is amending the columns to Int and constraining their values to 0 or 1 but is there an easier answer?
# For example with SQLITE (adminuser is a column of Users, described as
# Boolean):-
admin_user = Users.query.filter_by(club = form.club.data, adminuser = 1)"
# For Postgres:-
admin_user = Users.query.filter_by(club = form.club.data, adminuser =
True)"

How to load dynamic data into cassandra table? How to read csv file wih header also?

I want to load csv file (Its changing columns) into cassandra table?
File sometimes comes 10 columns and sometime 8 according to this how do i insert data into cassandra table?
Is there any way to load with using scala or batch commands?
How to read csv file wih header also?
There's a number of options here really. You could code your own solution using one of the Datastax drivers, or you could use the cqlsh COPY command, or the Datastax Bulk loader tool.
The fact that your source file changes format throws a bit of a curve ball at you here and assuming you dont have any control on the files that you have to load then in each base you'll need to create something that initially parses the file or transforms it into a common format with the same amount of columns.
For example if you're using the shell you could count the columns using something like awk and then base your actions upon that. A simple example with bash to count the number of columns:
$ cat csv.ex1
apples,bananas,grapes,pineapples
$ cat csv.ex2
oranges,mangos,melons,pears,rasberries,strawberries,blueberries
$ cat csv.ex1 | awk -F "," '{print "num of cols: "NF}'
num of cols: 4
$cat csv.ex2 | awk -F "," '{print "num of cols: "NF}'
num of cols: 7
Once you have this you should then be able to parse or transform your file accordingly and load into Cassandra like you would with any other csv file.

OnComponentOrder flow and tMap connections in Talend

I have the following flow:
1 Component that needs to be executed to extract from MYSQL a certain
timestamp
3 MYSQL inputs that needs to use that timestamp
1 tMap which needs to get the 3 mysql input
However, I am not allowed to connect the 3 mysql into the single tMap because they are depending on the first component (through OnComponentOk) but with different order. How do I orchestrate this sort of situations?
You could execute a query and set a global variable using the tSetGlobalVar component (referencing row1.mydate, for example), then in each of your queries going into tMap, reference the global variable like:
SELECT ...
FROM ...
WHERE mydate >= '" + (String) globalMap.get("myDate") + "';"
Two subjobs, one for getting the variable and storing it, and another for doing your three queries into tMap, etc.

How to assign csv field value to SQL query written inside table input step in Pentaho Spoon

I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.

How to select a single row in table on selenium IDE?

I am testing a web application in selenium IDE. I want to access a single row of table, but i am unable to select, plzzz tell me what command and target i should write?
Your question is ambiguous.
However I am assuming that you are trying to fetch the data/text available in a specific row of a table.
You can use the storeText command for this purpose. The syntax is as follows:
command: storeText
Target: locator
value: variable_name
I would suggest that you use Xpath as the locator. In the above command variable_name refers to the variable that will store the text which is fetched using the command.
To select a single row of the table, you need to know how to write Xpath for the required row.
Now to access the data in each of the rows in the table you can use storeText command within a while loop, and index that Xpath (element locator). As you store the text in the variable in each iteration, you can use the echo command to display the same onto the log. The required data can be extracted from the log using grep command in Linux.