I have a code like:
WORDTABLE presidentsOfUSA = 'presidentsOfUSA.csv';
DECLARE Annotation PresidentOfUSA(STRING party, INT yearOfInauguration);
Document{->MARKTABLE(PresidentOfUSA, 1, presidentsOfUSA, "party" = 2,
"yearOfInauguration" = 3)};
CSV like:
Bill Clinton;democrats;1993
Bills Clinton;republicans;2001
Data like:
Bill Clinton is president.
Bills Clinton is president.
Observation:
When i execute code it triggers on only "Bill Clinton" and not "Bills Clinton" even when word "Bills Clinton" is present in data.
It works fine if i keep only one entry in csv 1st/2nd.
Thanks in Advance!
This is caused by the lookup algorithm in Ruta. You either need to remove the whitespaces in the csv file, or let them be removed by ruta by setting the parameter dictRemoveWS to true.
Related
I have a dataset that is like the following:
val df = Seq("samb id 12", "car id 13", "lxu id 88").toDF("list")
I want to create a column that will be a string containing only the values after Id. The result would be something like:
val df_result = Seq(("samb id 12",12), ("car id 13",13), ("lxu id 88",88)).toDF("list", "id_value")
For that, I am trying to use substring. For the the parameter of the starting position to extract the substring, I am trying to use locate. But it gives me an error saying that it should be an Int and not a column type.
What I am trying is like:
df
.withColumn("id_value", substring($"list", locate("id", $"list") + 2, 2))
The error I get is:
error: type mismatch;
found : org.apache.spark.sql.Column
required: Int
.withColumn("id_value", substring($"list", locate("id", $"list") + 2, 2))
^
How can I fix this and continue using locate() as a parameter?
UPDATE
Updating to give an example in which #wBob answer doesn't work for my real world data: my data is indeed a bit more complicated than the examples above.
It is something like this:
val df = Seq(":option car, lorem :ipsum: :ison, ID R21234, llor ip", "lst ID X49329xas ipsum :ion: ip_s-")
The values are very long strings that don't have a specific pattern.
Somewhere in the string that is always a part written ID XXXXX. The XXXXX varies, but it is always the same size (5 characters) and always after a ID .
I am not being able to use neither split nor regexp_extract to get something in this pattern.
It is not clear if you want the third item or the first number from the list, but here are a couple of examples which should help:
// Assign sample data to dataframe
val df = Seq("samb id 12", "car id 13", "lxu id 88").toDF("list")
df
.withColumn("t1", split($"list", "\\ ")(2))
.withColumn("t2", regexp_extract($"list", "\\d+", 0))
.withColumn("t3", regexp_extract($"list", "(id )(\\d+)", 2))
.withColumn("t4", regexp_extract($"list", "ID [A-Z](\\d{5})", 1))
.show()
You can use functions like split and regexp_extract with withColumn to create new columns based on existing values. split splits out the list into an array based on the delimiter you pass in. I have used space here, escaped with two slashes to split the array. The array is zero-based hence specifying 2 gets the third item in the array. regexp_extract uses regular expressions to extract from strings. here I've used \\d which represents digits and + which matches the digit 1 or many times. The third column, t3, again uses regexp_extract with a similar RegEx expression, but using brackets to group up sections and 2 to get the second group from the regex, ie the (\\d+). NB I'm using additional slashes in the regex to escape the slashes used in the \d.
My results:
If your real data is more complicated please post a few simple examples where this code does not work and explain why.
I am experiencing various things while studying JPA, but I am too unfamiliar with it, so I would like to get some advice.
The parts I got stuck in during my study were grouped into three main categories. Could you please take a look at the code below?
#Repository
public interface TestRepository extends JpaRepository<TestEntity,Long> {
#Query(" SELECT
, A.test1
, A.test2
, B.test1
, B.test2
FROM TEST_TABLE1 A
LEFT JOIN TEST_TABLE2 B
ON A.test_no = B.test_no
WHERE A.test3 = ?1 # Here's the first question
if(VO.test4 is not null) AND B.test4 = ?2") # Here's the second question
List<Object[] # Here's the third question> getTestList(VO);
}
First, is it possible to extract test3 from the VO received when using native sql?
Usually, String test1 is used like this, but I wonder if there is any other way other than this.
Second, if extracting is possible in VO, can you add a query in #QUERY depending on whether Test4 is valued or not?
Thirdly, if I use List<Object[]>, can the result of executing a query that is not in the already created entity (eg, test1 in TEST_TABLE2, which is not in the entity of TEST_TABLE1) can be included?,
First, is it possible to extract test3 from the VO received when using native sql? Usually, String test1 is used like this, but I wonder if there is any other way other than this.
Yes, it is possible.
You must use, eg where :#{[0].test3} is equals vo.test3
[0] is position the first param, past for method annotated with #Query
#Query(value = "SELECT a.test1, a.test2, b.test1, b.test2
FROM test_table1 a
LEFT JOIN test_table2 b ON a.test_no = b.test_no
WHERE a.test3 = :#{[0].test3}", nativeQuery = true)
List<Object[]> getList(VO);
Second, if extracting is possible in VO, can you add a query in #QUERY depending on whether Test4 is valued or not?
You can use a trick eg:
SELECT ... FROM table a
LEFT JOIN table b ON a.id = b.id
WHERE a.test3 = :#{[0].test3}
AND (:#{[0].test4} IS NOT NULL AND b.test4 = :#{[0].test4})
Thirdly, if I use List<Object[]>, can the result of executing a query that is not in the already created entity (eg, test1 in TEST_TABLE2, which is not in the entity of TEST_TABLE1) can be included?
Sorry, but I not understand the third question.
Maybe this tutorial will help you: https://www.baeldung.com/jpa-queries-custom-result-with-aggregation-functions
How can I use tSQL to find a string, and if it exists, return everything before that string?
i.e. in the example below, in an ETL process, how would we take the column from source, identify the string ?uniquecode= and therefore remove that, and everything else after it, in the SELECT statement for the sink column?
How can I best modify this tSQL statement below to return the values in SinkPageURL column above?
SELECT SourcePageURL FROM ExampleTable
I have attempted a Fiddle here - http://sqlfiddle.com/#!18/3b60a/4 using the below statement. It is disregarding the values where '?uniquecode=' does not exist though, and also leaves the '?' symbol. Need this to work with MS SQL Server '17.
Somewhat close, but no cigar. Help appreciated!
SELECT LEFT(SourcePageURL, CHARINDEX('?uniquecode=', SourcePageURL)) FROM sql_test
Try this query:
SELECT
CASE WHEN CHARINDEX('?uniquecode=', SourcePageURL) > 0
THEN SUBSTRING(SourcePageURL,
1,
CHARINDEX('?uniquecode=', SourcePageURL) - 1)
ELSE SourcePageURL END AS new_source
FROM sql_test;
If you instead wanted to update the source URLs in your example using this logic, you could try the following:
UPDATE sql_test
SET SourcePageURL = SUBSTRING(SourcePageURL,
1,
CHARINDEX('?uniquecode=', SourcePageURL) - 1)
WHERE SourcePageURL LIKE '%?uniquecode=%';
I have a table: Task, and another TaskHistory
1 Task -> Many Task Histories
TaskHistory has a field named 'Comment'.
I would like to from SQL return a string of all of the Task History Comments for whatever TaskId I pass in.
Example:
GetTaskHistory(#TaskId)
Returns:
'Comment: some comment \r\n Comment: another comment \r\n\ Comment: yet another'
I am wondering if returning this from SQL would be faster than a recordset that I loop through in my application to build a string.
Is this possible?
Thank you!
This is possible, however I would recommend keeping the formatting in the application and simple return the data from SQL. That is however my opinion on how applications should be separated.
SELECT Comment
FROM TaskHistory
WHERE TaskID = 1
To concatenate column results into a string you can do something like:
DECLARE #HistoryComments nvarchar(MAX)
SELECT #HistoryComments = COALESCE(#HistoryComments + ' \r\n ', '') + Comment
FROM FROM TaskHistory
WHERE TaskID = 1
SELECT TaskHistorys = #HistoryComments
I have built a multistep form using CCK, however I have a couple of issues outstanding but, not sure if CCK can help.
In one step of the form I have 2 select boxes, the first is auto populated from the vocabulary table with the following code and all woks well.
$category_options = array();
$cat_res = db_query('select vid, name from vocabulary WHERE vid > 1 ORDER BY name ASC');
while ($cat_options = db_fetch_object($cat_res)) {
$category_options[$cat_options->vid] = $cat_options->name;
}
return $category_options;
What I would like to do is, when a user selects one item from the vocablulary list it auto populates another select box with terms from the term_data table. I have 2 issues;
1) I have added the following code to the second select list, just to make sure it works (IT DOESN'T). There a multiple terms associated with each vocabulary, but the second sql statemant only returns one result when it should return several, (SO SOMETHING WRONG HERE). For example in the term_date table there are 6 terms with the vid of 3, but I only get one added to select list.
$term_options = array();
$term_res = db_query('select vid, name from term_data WHERE vid = 3 ORDER BY name ASC'); while ($options = db_fetch_object($term_res)) {
$term_options[$options->vid] = $options->name;
}
return $term_options;
2) Can I add an onChange to the first select list to call a function to auto populate second list using CCK, or do I have to lean towards doing my entire form using the FORM API.
Any help or thoughts would be very much appreciated.
It seems to be a mistake in the query that gets terms. I tried to correct:
$term_res = db_query('select tid, name from term_data WHERE vid = 3 ORDER BY name ASC');
while ($options = db_fetch_object($term_res)) {
$term_options[$options->tid] = $options->name;
}
In your code you selected vid that is actually equal for all terms. Then you added terms names to $term_options array under the same key => so you got only 1 element.
Considering the second question: I would send the whole data structure (all vocabularies and their terms) as json to the client (insert a js script to the page from your drupal code) and implement the desired functionality with jquery.