How to mirror a queue in ActiveMQ Artemis - activemq-artemis

I don't know how to setup mirrors in my ActiveMQ Artemis broker.
Every message should be duplicated as described here, but the question is how to do that?
I tried using a non-exclusive divert, but it didn't work. I defined this in broker.xml:
<addresses>
<address name="source.AA">
<multicast>
<queue name="source.AA"/>
</multicast>
</address>
<address name="destination.AA">
<multicast>
<queue name="destination.AA"/>
</multicast>
</address>
</addresses>
<diverts>
<divert name="divert-AA">
<routing-name>divert-AA</routing-name>
<address>source.AA</address>
<forwarding-address>destination.AA</forwarding-address>
<exclusive>false</exclusive>
</divert>
</diverts>
However, the message is not replicated. What am I doing wrong?

Your configuration actually works fine on Artemis 2.16.0. I would suggest to look at how you are producing messages to that multicast queue.
# after server start
$ bin/artemis queue stat --url tcp://localhost:61616 --user admin --password admin | grep .AA
|destination.AA |destination.AA |0 |0 |0 |0 |0 |0 |MULTICAST |
|source.AA |source.AA |0 |0 |0 |0 |0 |0 |MULTICAST |
# after sending 4 messages
$ bin/artemis queue stat --url tcp://localhost:61616 --user admin --password admin | grep .AA
|destination.AA |destination.AA |0 |4 |4 |0 |0 |0 |MULTICAST |
|source.AA |source.AA |0 |4 |4 |0 |0 |0 |MULTICAST |

Related

How can I make a unique match with join with two spark dataframes and different columns?

I have two dataframes spark(scala):
First:
+-------------------+------------------+-----------------+----------+-----------------+
|id |zone |zone_father |father_id |country |
+-------------------+------------------+-----------------+----------+-----------------+
|2 |1 |123 |1 |0 |
|2 |2 |123 |1 |0 |
|3 |3 |1 |2 |0 |
|2 |4 |123 |1 |0 |
|3 |5 |2 |2 |0 |
|3 |6 |4 |2 |0 |
|3 |7 |19 |2 |0 |
+-------------------+------------------+-----------------+----------+-----------------+
Second:
+-------------------+------------------+-----------------+-----------------+
|country |id |zone |zone_value |
+-------------------+------------------+-----------------+-----------------+
|0 |2 |1 |7 |
|0 |2 |2 |7 |
|0 |2 |4 |8 |
|0 |0 |0 |2 |
+-------------------+------------------+-----------------+-----------------+
Then I need following logic:
1 -> If => first.id = second.id && first.zone = second.zone
2 -> Else if => first.father_id = second.id && first.zone_father = second.zone
3 -> If neither the first nor the second is true, follow the latter => first.country = second.zone
And the expected result would be:
+-------------------+------------------+-----------------+----------+-----------------+-----------------+
|id |zone |zone_father |father_id |country |zone_value |
+-------------------+------------------+-----------------+----------+-----------------+-----------------+
|2 |1 |123 |1 |0 |7 |
|2 |2 |123 |1 |0 |7 |
|3 |3 |1 |2 |0 |7 |
|2 |4 |123 |1 |0 |8 |
|3 |5 |2 |2 |0 |7 |
|3 |6 |4 |2 |0 |8 |
|3 |7 |19 |2 |0 |2 |
+-------------------+------------------+-----------------+----------+-----------------+-----------------+
I tried to join both dataframes, but due "or" operation, two results for each row is returned, because the last premise returns true regardless of the result of the other two.

Tableau + Redshift slowness in cursors

I know that Tableau Server uses cursors to refresh extracts, however, for running some simple queries that have large amounts of columns there is a large inconsistency in duration of the execution of some of this cursors. For example, when I run:
select * from svl_qlog
where userid = (select usesysid from pg_user where usename='tableau')
order by starttime desc
limit 20;
I get:
+------+--------+--------+----+--------------------------+--------------------------+---------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+------------+------------------------------+------------+
|userid|query |xid |pid |starttime |endtime |elapsed |aborted|label |substring |source_query|concurrency_scaling_status_txt|from_sp_call|
+------+--------+--------+----+--------------------------+--------------------------+---------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+------------+------------------------------+------------+
|108 |14993377|36192048|3270|2021-08-24 03:34:48.862153|2021-08-24 03:38:09.404563|200542410|0 |default |fetch 100000 in "SQL_CUR7"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993370|36192035|3270|2021-08-24 03:34:41.174557|2021-08-24 03:34:41.185152|10595 |0 |default |fetch 100000 in "SQL_CUR6"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993368|36192034|3270|2021-08-24 03:34:40.991779|2021-08-24 03:34:41.021350|29571 |0 |default |Undoing 1 transactions on table 1728726 with current xid 361|NULL |0 - Ran on the main cluster |NULL |
|108 |14993367|36192032|3270|2021-08-24 03:34:40.861741|2021-08-24 03:34:40.907681|45940 |0 |default |fetch 100000 in "SQL_CUR3"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993365|36192025|3262|2021-08-24 03:34:38.135543|2021-08-24 03:34:38.229458|93915 |0 |default |fetch 100000 in "SQL_CUR7"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993363|36192022|3262|2021-08-24 03:34:38.006010|2021-08-24 03:34:38.008911|2901 |0 |default |fetch 100000 in "SQL_CUR6"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993360|36192020|3262|2021-08-24 03:34:37.250081|2021-08-24 03:34:37.885200|635119 |0 |default |Undoing 1 transactions on table 1728724 with current xid 361|NULL |0 - Ran on the main cluster |NULL |
|108 |14993359|36192018|3262|2021-08-24 03:34:35.811267|2021-08-24 03:34:35.865765|54498 |0 |default |fetch 100000 in "SQL_CUR3"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993287|36191920|2934|2021-08-24 03:33:16.921494|2021-08-24 03:33:38.143570|21222076 |0 |default |fetch 100000 in "SQL_CUR7"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993285|36191917|2934|2021-08-24 03:33:16.618563|2021-08-24 03:33:16.623745|5182 |0 |default |fetch 100000 in "SQL_CUR6"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993281|36191916|2934|2021-08-24 03:33:15.619813|2021-08-24 03:33:16.493711|873898 |0 |default |Undoing 1 transactions on table 1728722 with current xid 361|NULL |0 - Ran on the main cluster |NULL |
|108 |14993280|36191914|2934|2021-08-24 03:33:14.720016|2021-08-24 03:33:14.787236|67220 |0 |default |fetch 100000 in "SQL_CUR3"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993071|36191663|2258|2021-08-24 03:30:25.760462|2021-08-24 03:31:05.340131|39579669 |0 |default |fetch 100000 in "SQL_CUR7"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993069|36191660|2258|2021-08-24 03:30:25.359800|2021-08-24 03:30:25.366651|6851 |0 |default |fetch 100000 in "SQL_CUR6"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993064|36191657|2258|2021-08-24 03:30:25.170646|2021-08-24 03:30:25.245196|74550 |0 |default |Undoing 1 transactions on table 1728720 with current xid 361|NULL |0 - Ran on the main cluster |NULL |
|108 |14993063|36191655|2258|2021-08-24 03:30:25.045651|2021-08-24 03:30:25.079935|34284 |0 |default |fetch 100000 in "SQL_CUR3"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993053|36191642|2182|2021-08-24 03:30:18.163032|2021-08-24 03:30:18.381360|218328 |0 |default |fetch 100000 in "SQL_CUR7"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993050|36191638|2182|2021-08-24 03:30:18.029206|2021-08-24 03:30:18.032746|3540 |0 |default |fetch 100000 in "SQL_CUR6"; |NULL |0 - Ran on the main cluster |NULL |
|108 |14993043|36191620|2182|2021-08-24 03:30:15.207471|2021-08-24 03:30:15.853592|646121 |0 |default |Undoing 1 transactions on table 1728718 with current xid 361|NULL |0 - Ran on the main cluster |NULL |
|108 |14993042|36191618|2182|2021-08-24 03:30:14.086680|2021-08-24 03:30:14.131522|44842 |0 |default |fetch 100000 in "SQL_CUR3"; |NULL |0 - Ran on the main cluster |NULL |
+------+--------+--------+----+--------------------------+--------------------------+---------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------+------------+------------------------------+------------+
Just by eyeballing the elapsed column, you can tell the problem. So, my question is, is there a known reason for some of these cursors to take way longer? Also, is Tableau or Redshift the culprit here? And what's with these:
Undoing 1 transactions on table 1728722 with current xid 361
Thanks!
The first fetch to the cursor runs the query. Subsequent fetches just retrieves the next set of data that is already computed and waiting on the leader node. In your example it looks like your returned data is less than 10K rows as I don't see subsequent fetches for the same transaction. Is this correct or are you missing some fetches? Since you say that you are refreshing extracts it seems like more data would be expected but I don't know your situation.
Given this it is likely that the different fetch times are due to differing query complexities and run times. If you look at activity (svl_statementtext) for these XIDs you can see what query is being sent to populate the cursor. You can then run these queries yourself and see how complex they are and general run times.
Other possibilities include queuing delays and/or high query load of the database and/or high load on the leader node.
Do you know the size of the dataset being returned? If this is very very large then there is likely better ways to get the refresh data to Tableau. Pulling data through the leader node with a select query but these methods take effort to set up.
Questions to answer - What queries go with which fetches? How long do these take to run standalone? How much data does each query return? Knowing these answers will help is setting a path for improvement.

getting duplicate count but retaining duplicate rows in pyspark

I am trying to find the duplicate count of rows in a pyspark dataframe. I found a similar answer here
but it only outputs a binary flag. I would like to have the actual count for each row.
To use the orignal post's example, if I have a dataframe like so:
+--+--+--+--+
|a |b |c |d |
+--+--+--+--+
|1 |0 |1 |2 |
|0 |2 |0 |1 |
|1 |0 |1 |2 |
|0 |4 |3 |1 |
|1 |0 |1 |2 |
+--+--+--+--+
I would like to result in something like:
+--+--+--+--+--+--+--+--+
|a |b |c |d |row_count |
+--+--+--+--+--+--+--+--+
|1 |0 |1 |2 |3 |
|0 |2 |0 |1 |0 |
|1 |0 |1 |2 |3 |
|0 |4 |3 |1 |0 |
|1 |0 |1 |2 |3 |
+--+--+--+--+--+--+--+--+
Is this possible?
Thank You
Assuming df is your input dataframe:
from pyspark.sql.window import Window
from pyspark.sql import functions as F
from pyspark.sql.functions import *
w = (Window.partitionBy([F.col("a"), F.col("b"), F.col("c"), F.col("D")]))
df=df.select(F.col("a"), F.col("b"), F.col("c"), F.col("D"), F.count(F.col("a")).over(w).alias("row_count"))
If, as per your example, you want to replace every count 1 with 0 do:
from pyspark.sql.window import Window
from pyspark.sql import functions as F
from pyspark.sql.functions import *
w = (Window.partitionBy([F.col("a"), F.col("b"), F.col("c"), F.col("D")]))
df=df.select(F.col("a"), F.col("b"), F.col("c"), F.col("D"), F.count(F.col("a")).over(w).alias("row_count")).select("a", "b", "c", "d", F.when(F.col("row_count")==F.lit(1), F.lit(0)). otherwise(F.col("row_count")).alias("row_count"))

Spark 1.6 VectorAssembler unexpected results

I try to create a label-feature DataFrame using Spark's VectorAssembler.
According to the Spark docs, it should be as simple as this:
val incidentDF = sqlContext.sql("select `is_similar`, `cosine_similarity`,..... from some.table")
//vectorassembler: compact all relevant columns into a vector
val assembler = new VectorAssembler()
assembler.setInputCols(Array("cosine_similarity", ....."))
assembler.setOutputCol("features")
val output = assembler.transform(incidentDF).select("is_similar", "features").withColumnRenamed("is_similar", "label")
However, I get unexpected results.
This:
+----------+---------------------+----------------------------+----------------------+-----------------------------+-----------------------+------------------------------+--------------------+-------------+----------------+-------------+-------------+-------------------+--------+-------------------+---------------------------+----------------------------------+----------------------------+-----------------------------------+-----------------------------+------------------------------------+--------------------+------------------------------------------+-----------------------------------+------------------------------------+-----------------------------+
|0 |0.21437323142813602 |0.08703882797784893 |0.23570226039551587 |0.10050378152592121 |0.10206207261596577 |0.0 |1 |1 |1 |1 |1 |1 |1 |0.26373626373626374|0.012967453461681464 |0.007624195465949381 |0.014425347541872306 |0.008896738386617248 |0.022695267556861232 |0.0 |1 |0.16838138468917166 |0.15434287415564008 |0.3922322702763681 |0.34874291623145787 |
|1 |0.5303300858899107 |0.5017452060042545 |0.5303300858899107 |0.5017452060042545 |0.5303300858899107 |0.5017452060042545 |1 |1 |1 |1 |1 |1 |1 |0.6870229007633588 |0.3534850108895589 |0.5857224407945156 |0.36079979664267925 |0.5853463384675868 |0.36971703925333405 |0.5814734067275937 |0 |1.0 |0.9999999999999998 |1.0 |0.9999999999999998 |
|0 |0.31754264805429416 |0.30151134457776363 |0.33541019662496846 |0.3344968040028363 |0.2867696673382022 |0.26111648393354675 |1 |1 |0 |1 |1 |1 |1 |0.41600000000000004|0.10867521883199269 |0.1920005048084368 |0.1322792942407786 |0.2477844869237889 |0.11802058757911914 |0.16554971608261862 |1 |0.0 |0.01605611773109364 |0.0 |0.16666666666666666 |
|0 |0.16169041669088866 |0.0 |0.1666666666666667 |0.0 |0.09622504486493764 |0.0 |1 |1 |1 |1 |1 |1 |1 |0.26666666666666666|0.012517205514308224 |0.0 |0.012752837227090714 |0.0 |0.021516657911501622 |0.0 |1 |0.16838138468917166 |0.15434287415564008 |0.3922322702763681 |0.34874291623145787 |
|0 |0.2750456656690116 |0.1860521018838127 |0.2858309752375147 |0.19611613513818402 |0.223606797749979 |0.1386750490563073 |1 |1 |1 |1 |1 |1 |1 |0.34862385321100914|0.06278282792172384 |0.09178430436891666 |0.06694373400084344 |0.08253907697526759 |0.07508140721703477 |0.10856631569349082 |1 |0.3014783135305502 |0.25688979598845174 |0.5590169943749475 |0.47628967220784013 |
|0 |0.2449489742783178 |0.19810721293758182 |0.26352313834736496 |0.2307692307692308 |0.21629522817435007 |0.16012815380508716 |1 |1 |0 |1 |1 |1 |1 |0.4838709677419355 |0.12209521675839743 |0.19126420671254496 |0.1475066405521753 |0.2459312750965279 |0.1242978535834829 |0.1886519686826469 |1 |0.0 |0.01605611773109364 |0.0 |0.16666666666666666 |
|0 |0.08320502943378437 |0.09642365197998375 |0.11952286093343938 |0.13912166872805048 |0.0 |0.0 |0 |0 |0 |1 |0 |0 |1 |0.12 |0.04035362208133099 |0.04456121367953338 |0.04819698770773715 |0.0538656145326838 |0.0 |0.0 |8 |0.05825659037076343 |0.05246835256923818 |0.112089707663561 |0.11278230910134424 |
|0 |0.20784609690826525 |0.1846372364689991 |0.26111648393354675 |0.24806946917841688 |0.0 |0.0 |0 |0 |0 |1 |0 |1 |1 |0.0 |0.07233915683015167 |0.0716540790026919 |0.08229370516713722 |0.08299754342027771 |0.0 |0.0 |6 |0.04977054860197747 |0.06558734556106822 |0.09607689228305229 |0.21759706994462227 |
|1 |0.8926577981869824 |0.9066143160193102 |0.914335372996105 |0.9226517385233938 |0.5477225575051661 |0.6324555320336759 |0 |0 |0 |0 |0 |0 |1 |0.5309734513274337 |0.8734996606615234 |0.8946928809168011 |0.8791317315987442 |0.8973856295754765 |0.3496004425218079 |0.48223175160299564 |0 |0.0 |0.0 |0.0 |0.0 |
|1 |0.5185629788417315 |0.8432740427115678 |0.5118906968889915 |0.8819171036881969 |0.24253562503633297 |0.3333333333333333 |1 |1 |0 |1 |1 |1 |1 |0.09375 |0.18908955158360016 |0.8022196858263557 |0.17544355300115252 |0.8474955187144462 |0.13927839835275616 |0.2838123484309787 |6 |0.0 |0.0 |0.0 |0.0 |
|1 |0.0 |0.0 |0.0 |0.0 |0.0 |0.0 |0 |0 |1 |1 |0 |0 |1 |0.14814814814814814|0.0 |0.0 |0.0 |0.0 |0.0 |0.0 |1 |0.02170244443925667 |0.020410228072244255 |0.15062893357603016 |0.28922903686544305 |
|0 |0.26860765467512676 |0.06271815075053182 |0.29515063885057 |0.07485976927589244 |0.0 |0.0 |0 |0 |1 |1 |0 |0 |1 |0.08 |0.04804110216570731 |0.03027143543580809 |0.05341183077151175 |0.03431607006581793 |0.0 |0.0 |1 |0.0 |0.022192268824097448 |0.0 |0.24019223070763074 |
|1 |0.33333333333333337 |0.40824829046386296 |0.33333333333333337 |0.40824829046386296 |0.33333333333333337 |0.40824829046386296 |0 |0 |0 |1 |0 |1 |1 |0.4516129032258064 |0.3310013083604027 |0.3537516145932176 |0.3444032278588375 |0.3667764454925114 |0.3042153384207993 |0.3408010155297054 |6 |0.28297384452448776 |0.23615630148525626 |0.2182178902359924 |0.19245008972987526 |
|0 |0.0519174131651165 |0.0 |0.0917662935482247 |0.0 |0.0 |0.0 |0 |0 |1 |1 |0 |0 |1 |0.0967741935483871 |0.03050544547960052 |0.0 |0.0490339271669166 |0.0 |0.0 |0.0 |5 |0.0 |0.0 |0.0 |0.0 |
|0 |0.049160514400834666 |0.0 |0.02627034687463669 |0.0 |0.0 |0.0 |0 |0 |0 |0 |0 |0 |1 |0.1282051282051282 |0.006316709944109247 |0.0 |0.003132143258557757 |0.0 |0.0 |0.0 |3 |0.0 |0.019794166951004794 |0.0 |0.15638581054280606 |
|0 |0.07082882469748285 |0.0 |0.08494119857293758 |0.0 |0.0 |0.0 |0 |0 |0 |1 |0 |1 |1 |0.06060606060606061|0.004924318378089263 |0.0 |0.005845759285912874 |0.0 |0.0 |0.0 |4 |0.023119472246583003 |0.010659666129102227 |0.03210289415620512 |0.04420122177473814 |
|0 |0.1924976258772545 |0.038014296063485276 |0.19149207069693872 |0.02521364528296496 |0.0 |0.0 |0 |0 |0 |1 |0 |1 |1 |0.125 |0.020931167922971575 |0.00448818821863432 |0.02118543184402528 |0.0026553570889578286 |0.0 |0.0 |5 |0.02336541089352552 |0.02401310014140845 |0.11919975664202526 |0.10760330515353056 |
|1 |0.17095921484405754 |0.08434614994311695 |0.20073126386549828 |0.10085458113185984 |0.0 |0.0 |0 |0 |1 |0 |0 |1 |1 |0.07407407407407407|0.09182827200781651 |0.05443489342945772 |0.10010815165693956 |0.05842165588249673 |0.0 |0.0 |8 |0.2973721930047951 |0.168690765981807 |0.5637584095764486 |0.48478000681923245 |
|0 |0.1405456737852613 |0.049147318718299055 |0.11846977555181847 |0.08333333333333333 |0.22360679774997896 |0.0 |1 |1 |1 |1 |1 |1 |1 |0.08333333333333331|0.01937969263670974 |0.003427781939920998 |0.022922840542318093 |0.006443992956721386 |0.03572605281706383 |0.0 |5 |0.26345546669165004 |0.2557786050767472 |0.405007416909787 |0.45121260440202404 |
|1 |0.6793662204867575 |0.753778361444409 |0.5773502691896258 |0.6396021490668313 |0.5773502691896258 |0.8164965809277259 |0 |0 |1 |1 |0 |0 |1 |0.6875 |0.7466360531069871 |0.8217912018147824 |0.7034677645212848 |0.6620051533994062 |0.469853400225108 |0.9321213932723664 |6 |0.0 |0.011793139853629018 |0.0 |0.14433756729740643 |
+----------+---------------------+----------------------------+----------------------+-----------------------------+-----------------------+------------------------------+--------------------+-------------+----------------+-------------+-------------+-------------------+--------+-------------------+---------------------------+----------------------------------+----------------------------+-----------------------------------+-----------------------------+------------------------------------+--------------------+------------------------------------------+-----------------------------------+------------------------------------+-----------------------------+
Becomes this:
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|label|features |
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|0 |[0.21437323142813602,0.08703882797784893,0.23570226039551587,0.10050378152592121,0.10206207261596577,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.26373626373626374,0.012967453461681464,0.007624195465949381,0.014425347541872306,0.008896738386617248,0.022695267556861232,0.0,1.0,0.16838138468917166,0.15434287415564008,0.3922322702763681,0.34874291623145787] |
|1 |[0.5303300858899107,0.5017452060042545,0.5303300858899107,0.5017452060042545,0.5303300858899107,0.5017452060042545,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.6870229007633588,0.3534850108895589,0.5857224407945156,0.36079979664267925,0.5853463384675868,0.36971703925333405,0.5814734067275937,0.0,1.0,0.9999999999999998,1.0,0.9999999999999998] |
|0 |[0.31754264805429416,0.30151134457776363,0.33541019662496846,0.3344968040028363,0.2867696673382022,0.26111648393354675,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.41600000000000004,0.10867521883199269,0.1920005048084368,0.1322792942407786,0.2477844869237889,0.11802058757911914,0.16554971608261862,1.0,0.0,0.01605611773109364,0.0,0.16666666666666666] |
|0 |[0.16169041669088866,0.0,0.1666666666666667,0.0,0.09622504486493764,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.26666666666666666,0.012517205514308224,0.0,0.012752837227090714,0.0,0.021516657911501622,0.0,1.0,0.16838138468917166,0.15434287415564008,0.3922322702763681,0.34874291623145787] |
|0 |[0.2750456656690116,0.1860521018838127,0.2858309752375147,0.19611613513818402,0.223606797749979,0.1386750490563073,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.34862385321100914,0.06278282792172384,0.09178430436891666,0.06694373400084344,0.08253907697526759,0.07508140721703477,0.10856631569349082,1.0,0.3014783135305502,0.25688979598845174,0.5590169943749475,0.47628967220784013]|
|0 |[0.2449489742783178,0.19810721293758182,0.26352313834736496,0.2307692307692308,0.21629522817435007,0.16012815380508716,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.4838709677419355,0.12209521675839743,0.19126420671254496,0.1475066405521753,0.2459312750965279,0.1242978535834829,0.1886519686826469,1.0,0.0,0.01605611773109364,0.0,0.16666666666666666] |
|0 |[0.08320502943378437,0.09642365197998375,0.11952286093343938,0.13912166872805048,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.12,0.04035362208133099,0.04456121367953338,0.04819698770773715,0.0538656145326838,0.0,0.0,8.0,0.05825659037076343,0.05246835256923818,0.112089707663561,0.11278230910134424] |
|0 |[0.20784609690826525,0.1846372364689991,0.26111648393354675,0.24806946917841688,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.07233915683015167,0.0716540790026919,0.08229370516713722,0.08299754342027771,0.0,0.0,6.0,0.04977054860197747,0.06558734556106822,0.09607689228305229,0.21759706994462227] |
|1 |(25,[0,1,2,3,4,5,12,13,14,15,16,17,18,19],[0.8926577981869824,0.9066143160193102,0.914335372996105,0.9226517385233938,0.5477225575051661,0.6324555320336759,1.0,0.5309734513274337,0.8734996606615234,0.8946928809168011,0.8791317315987442,0.8973856295754765,0.3496004425218079,0.48223175160299564]) |
|1 |[0.5185629788417315,0.8432740427115678,0.5118906968889915,0.8819171036881969,0.24253562503633297,0.3333333333333333,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.09375,0.18908955158360016,0.8022196858263557,0.17544355300115252,0.8474955187144462,0.13927839835275616,0.2838123484309787,6.0,0.0,0.0,0.0,0.0] |
|1 |(25,[8,9,12,13,20,21,22,23,24],[1.0,1.0,1.0,0.14814814814814814,1.0,0.02170244443925667,0.020410228072244255,0.15062893357603016,0.28922903686544305]) |
|0 |(25,[0,1,2,3,8,9,12,13,14,15,16,17,20,22,24],[0.26860765467512676,0.06271815075053182,0.29515063885057,0.07485976927589244,1.0,1.0,1.0,0.08,0.04804110216570731,0.03027143543580809,0.05341183077151175,0.03431607006581793,1.0,0.022192268824097448,0.24019223070763074]) |
|1 |[0.33333333333333337,0.40824829046386296,0.33333333333333337,0.40824829046386296,0.33333333333333337,0.40824829046386296,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.4516129032258064,0.3310013083604027,0.3537516145932176,0.3444032278588375,0.3667764454925114,0.3042153384207993,0.3408010155297054,6.0,0.28297384452448776,0.23615630148525626,0.2182178902359924,0.19245008972987526]|
|0 |(25,[0,2,8,9,12,13,14,16,20],[0.0519174131651165,0.0917662935482247,1.0,1.0,1.0,0.0967741935483871,0.03050544547960052,0.0490339271669166,5.0]) |
|0 |(25,[0,2,12,13,14,16,20,22,24],[0.049160514400834666,0.02627034687463669,1.0,0.1282051282051282,0.006316709944109247,0.003132143258557757,3.0,0.019794166951004794,0.15638581054280606]) |
|0 |(25,[0,2,9,11,12,13,14,16,20,21,22,23,24],[0.07082882469748285,0.08494119857293758,1.0,1.0,1.0,0.06060606060606061,0.004924318378089263,0.005845759285912874,4.0,0.023119472246583003,0.010659666129102227,0.03210289415620512,0.04420122177473814]) |
|0 |[0.1924976258772545,0.038014296063485276,0.19149207069693872,0.02521364528296496,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.125,0.020931167922971575,0.00448818821863432,0.02118543184402528,0.0026553570889578286,0.0,0.0,5.0,0.02336541089352552,0.02401310014140845,0.11919975664202526,0.10760330515353056] |
|1 |[0.17095921484405754,0.08434614994311695,0.20073126386549828,0.10085458113185984,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.07407407407407407,0.09182827200781651,0.05443489342945772,0.10010815165693956,0.05842165588249673,0.0,0.0,8.0,0.2973721930047951,0.168690765981807,0.5637584095764486,0.48478000681923245] |
|0 |[0.1405456737852613,0.049147318718299055,0.11846977555181847,0.08333333333333333,0.22360679774997896,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.08333333333333331,0.01937969263670974,0.003427781939920998,0.022922840542318093,0.006443992956721386,0.03572605281706383,0.0,5.0,0.26345546669165004,0.2557786050767472,0.405007416909787,0.45121260440202404] |
|1 |[0.6793662204867575,0.753778361444409,0.5773502691896258,0.6396021490668313,0.5773502691896258,0.8164965809277259,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.6875,0.7466360531069871,0.8217912018147824,0.7034677645212848,0.6620051533994062,0.469853400225108,0.9321213932723664,6.0,0.0,0.011793139853629018,0.0,0.14433756729740643] |
+-----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
And as you can see here, the result has 2 different results, not just one unified vector.
Is this a bug in CDH's spark (1.6) or am I missing something?
TL;DR This is a normal behavior.
Your data contains a number of sparse rows. When assembled these are converted to SparseVector and represented in the output as
(size, [idx1, idx2, ..., idxm], [val1, val2, ..., valm])
where idx1..indm are positions of non-zero values, and val1..valm corresponding value. So following
(25,[8,9,12,13, ...],[1.0,1.0,1.0,0.14814814814814814, ...])
is a SparseVector of size 25, where 9-th position is equal to 1.0, and 13-th to 0.148.
If data is dense (less than half of the values is equal to zero) you get DenseVectors which in your input are represented as:
[val0, val1, ..., valn]
Both representations are perfectly valid and majority of tools will accept both just fine.

Data loss after writing in spark

I obtain a resultant dataframe after performing some computations over it.Say the dataframe is result. When i write it to Amazon S3 there are specific cells which are shown blank. The top 5 of my result dataframe is:
_________________________________________________________
|var30 |var31 |var32 |var33 |var34 |var35 |var36|
--------------------------------------------------------
|-0.00586|0.13821 |0 | |1 | | |
|3.87635 |2.86702 |2.51963 |8 |11 |2 |14 |
|3.78279 |2.54833 |2.45881 | |2 | | |
|-0.10092|0 |0 |1 |1 |3 |1 |
|8.08797 |6.14486 |5.25718 | |5 | | |
---------------------------------------------------------
But when i run result.show() command i am able to see the values.
_________________________________________________________
|var30 |var31 |var32 |var33 |var34 |var35 |var36|
--------------------------------------------------------
|-0.00586|0.13821 |0 |2 |1 |1 |6 |
|3.87635 |2.86702 |2.51963 |8 |11 |2 |14 |
|3.78279 |2.54833 |2.45881 |2 |2 |2 |12 |
|-0.10092|0 |0 |1 |1 |3 |1 |
|8.08797 |6.14486 |5.25718 |20 |5 |5 |34 |
---------------------------------------------------------
Also, the blank are shown in same cells every time i run it.
Use this to save data to your s3
DataFrame.repartition(1).write.format("com.databricks.spark.csv").option("header", "true").save("s3n://Yourpath")
For anyone who might have come across this issue, I can tell what worked for me.
I was joining 1 data frame ( let's say inputDF) with another df ( delta DF) based on some logic and storing in an output data frame (outDF). I was getting same error where by I could see a record in outDF.show() but while writing this dataFrame into a hive table OR persisting the outDF ( using outDF.persist(StorageLevel.MEMORY_AND_DISC)) I wasn't able to see that particular record.
SOLUTION:- I persisted the inputDF ( inputDF.persist(StorageLevel.MEMORY_AND_DISC)) before joining it with deltaDF. After that outDF.show() output was consistent with the hive table where outDF was written.
P.S:- I am not sure how this solved the issue. Would be awesome if someone could explain this, but the above worked for me.