How can I make my dataframe sample in each run using apache spark - scala

I have a peculiar scenario in which the sample obtained in two consecutive samplings are not consistent even when I've provided a seed value. I'm using the following code (Which was an outcome of a discussion here:
var conversionSample = sortedConversionSubset.sample(true, (sampleSize + 0.05), 3*x).limit((conversionCount * sampleSize).toInt)
var nonConversionSample = sortedNonConversionSubset.sample(true, (sampleSize + 0.05), 3*x).limit((nonConversionCount * sampleSize).toInt)
Here
'sampleSize' is a constant fraction value less than 0.8
'x' is a constant int, which represents xth iteration in a for loop
'conversionCount' and 'nonConversionCount' are int values representing number of rows in each subset
Now the observation being that in two successive runs the sample generated is different in both cases which was not the expected behavior.
sortedConversionSubset
+--------------------------------------+----------+
|clientid |Conversion|
+--------------------------------------+----------+
|02438b66-2de4-4765-bae3-de7453647ea7_1|1 |
|203865ed-f02a-4ed9-9098-82691de707a4_0|1 |
|203865ed-f02a-4ed9-9098-82691de707a4_1|1 |
|674e2337-aec5-434e-b56e-8c2efcc42894_1|1 |
|6d6036d3-c161-4f5d-8557-80b85dd87bd9_0|1 |
|6d6036d3-c161-4f5d-8557-80b85dd87bd9_1|1 |
|7797aba3-3eea-4556-856e-753812b4b551_0|1 |
|7797aba3-3eea-4556-856e-753812b4b551_1|1 |
|870ab2a5-0650-42b8-9e6f-bde3859f64fd_0|1 |
|870ab2a5-0650-42b8-9e6f-bde3859f64fd_1|1 |
|9b606693-4ffa-44a5-bd7c-cc6974ce3e83_0|1 |
|be218b72-c664-40cf-adf5-e3519095e941_0|1 |
|e7dc7fd9-32df-46a1-b3bd-793bbda09f6f_0|1 |
|eaf434da-6a8f-4ab0-a744-62bea663ed5e_0|1 |
|eaf434da-6a8f-4ab0-a744-62bea663ed5e_1|1 |
+--------------------------------------+----------+
sortedNonConversionSubset
+--------------------------------------+----------+
|clientid |Conversion|
+--------------------------------------+----------+
|03358d8f-9b9c-4258-9c99-234ab102c29b_1|0 |
|040d213c-e91a-42f4-9bf7-90671670dc17_0|0 |
|04fe5148-1c56-4c88-aed0-1f01220bffd6_0|0 |
|0ed2e621-9ba4-46f0-8793-a84d32538c39_0|0 |
|0f9bcf42-e7fa-49a0-9d75-6c9bbc38b4d5_0|0 |
|108c5478-abc0-44d9-968b-47f81c4f5a37_0|0 |
|129eb883-159d-49be-b8ae-9aa44a3e2919_0|0 |
|13e3d779-026b-4d12-8619-aa5fe6ca99ed_0|0 |
|14497295-eebd-44aa-9f26-fc5e4810fb54_0|0 |
|1855d96d-3647-4c4f-a20f-7e46f7635798_0|0 |
|1911caf0-a470-4898-9b62-57c604422727_0|0 |
|1b91b8dc-09b8-47e2-b892-f5c14b650019_0|0 |
|1dfa820c-77e0-4927-8a39-ecd8e842b09b_0|0 |
|1e48e346-4ada-4a8d-896b-7658cc2499cd_0|0 |
|252be902-4204-40a5-9d3c-dd3a7d0f0355_0|0 |
|2995b49d-525b-43e9-ab36-8b8910a4607c_0|0 |
|2bc06b59-4624-4ddd-87a3-ed04cba88233_0|0 |
|2d4538a5-20e6-4742-ae46-aad0a5ed3fff_0|0 |
|31563716-9380-4662-90e5-7f63a1ab9072_0|0 |
|34442a3e-0437-4c41-86fb-1ac55062993a_0|0 |
|35151629-2f86-4917-90d2-42daa5ae4f5c_0|0 |
|3c37e066-dff5-4bd9-84ab-b9e73f3f3fdd_0|0 |
|3e998096-3a4b-4b57-a1de-69d2dbd19abd_0|0 |
|3f8ace3c-d378-4423-97a0-3d9cf35ba256_0|0 |
|49a0cfb8-490f-4252-84fa-2b9e250e9333_0|0 |
|4c3f11fa-e3ba-4eb1-977a-06f034bf8a54_0|0 |
|4ee484f4-e877-44c3-9390-c4e4072c5dee_0|0 |
|4fa035b3-dcd5-40e1-9107-0a0c943ff597_1|0 |
|529704d2-5a60-4718-a03f-639e040f6634_0|0 |
|560f6978-028b-4a37-9f97-d97e93976bf7_0|0 |
|57b47c74-b071-4278-89c9-f7b4cb1225d1_0|0 |
|58305773-f944-4039-8452-f5eb8d62f0cf_0|0 |
|58dfa9dd-43cf-4eb7-ade6-7235004a9815_0|0 |
|5b146218-9bb6-46f0-8c83-df131d78f591_0|0 |
|5ca3b5bc-35a9-42a5-bd37-a8fc94366dc6_0|0 |
|5d5f2ea0-aed9-4c2d-8c22-68859ec35e8e_0|0 |
|5f9ebf92-3b1b-4628-b949-44a32e6d3659_0|0 |
|64822b8c-009e-48ab-b6ca-1a7ece1106fa_0|0 |
|6b352714-af74-4773-854b-073e644e8684_0|0 |
|6e528e49-472e-48c7-baa9-edc25303e427_0|0 |
|73203f58-8be2-4716-b8f0-79c64400c57b_0|0 |
|741630e0-1c99-497d-a127-5c4c562952c5_0|0 |
|778e3b8a-2ca5-469a-9697-f646962e8308_0|0 |
|8029c542-d933-43fb-b359-f2438dcd5660_0|0 |
|8b06ba24-2af3-4eec-811a-4d1779f37876_0|0 |
|8fb43dff-260d-4ece-85e2-3bc2cb636ac1_0|0 |
|90f8a4cb-1956-43c4-ac7d-8c6514cd023a_0|0 |
|916f2e2a-6135-4004-8d54-d80b822ce394_0|0 |
|968a7ca3-1649-4586-9e60-b7e8565e708a_0|0 |
|a32782cc-8c4c-403b-aa83-09f1cec45fdb_0|0 |
|a63f44d5-a4d5-45a0-8a4b-cebf05df810b_0|0 |
|a6f958bc-e050-4216-b981-d51f1c0ff60d_0|0 |
|a7dba1bb-d7ff-44e6-9c4c-997ae59a2337_1|0 |
|ac33d675-d9cc-43b5-94fb-7d412773db14_0|0 |
|b1227816-9bf2-474f-8e82-5739acf6c895_0|0 |
|b1c27a2e-6efc-4869-880b-9ce0a4962edc_0|0 |
|b4ff6d43-cf0a-4f1d-9431-1edcb8ee1fb6_0|0 |
|b9e477ab-2065-42bb-832b-5d0e98ee05c7_0|0 |
|ba8c4efe-e71c-468c-b1bf-37efff596907_0|0 |
|c21eefc8-43d0-4be0-a252-b9fc4dbb7ad0_0|0 |
|c3785311-87c8-43bc-99a8-01d64f5eaa87_0|0 |
|c543bde7-deb8-4484-b0be-353c44baf6eb_1|0 |
|ca31e550-9d28-4628-bfe8-53648a2007f7_0|0 |
|cbc33697-20cb-4f8b-accd-0a6396a4ea41_0|0 |
|cc7810aa-08fc-44e7-acdc-ac948a28f9b9_0|0 |
|d1efdc7c-afb0-4995-bbbd-a76f731d2492_0|0 |
|d6a4b928-e576-41d7-9628-18709765199d_0|0 |
|d7311ec7-6c50-448d-8a6e-f690c3070d57_1|0 |
|d86b09f9-70a0-4101-a13b-129fe3a37b86_0|0 |
|d911be5b-aceb-45c8-a79e-73ccfa1b96f0_0|0 |
|db0c7b10-80f7-4071-aa53-fe0e2dc5ebce_0|0 |
|dce14c51-fa57-4e98-987d-708e2a9aa293_0|0 |
|dd026fb8-f818-4d1e-aaa4-4c9b3fd24994_0|0 |
|dfa9c55c-1e75-4010-be86-a6b1eb723672_0|0 |
|ea29f600-9e85-40f4-9f88-dcef46beb0c1_0|0 |
|eb5e58fc-eaac-4059-8ebc-1fab1ccf3555_1|0 |
|eb7568ab-83ac-45a7-bf4b-3b048d6c7c53_0|0 |
|f5b1cfc4-e397-4699-adab-0af6ee0e1b76_0|0 |
|facbfc8c-d477-4b27-bf15-52a56c26cbf6_0|0 |
|ffd03bca-ef40-4fa4-913e-73c002f29796_0|0 |
+--------------------------------------+----------+
1st Run Sample
+--------------------------------------+----------+
|clientid |Conversion|
+--------------------------------------+----------+
|203865ed-f02a-4ed9-9098-82691de707a4_1|1 |
|6d6036d3-c161-4f5d-8557-80b85dd87bd9_0|1 |
|6d6036d3-c161-4f5d-8557-80b85dd87bd9_1|1 |
|02438b66-2de4-4765-bae3-de7453647ea7_1|1 |
|7797aba3-3eea-4556-856e-753812b4b551_0|1 |
|870ab2a5-0650-42b8-9e6f-bde3859f64fd_0|1 |
|1dfa820c-77e0-4927-8a39-ecd8e842b09b_0|0 |
|252be902-4204-40a5-9d3c-dd3a7d0f0355_0|0 |
|2995b49d-525b-43e9-ab36-8b8910a4607c_0|0 |
|2bc06b59-4624-4ddd-87a3-ed04cba88233_0|0 |
|31563716-9380-4662-90e5-7f63a1ab9072_0|0 |
|5ca3b5bc-35a9-42a5-bd37-a8fc94366dc6_0|0 |
|5d5f2ea0-aed9-4c2d-8c22-68859ec35e8e_0|0 |
|5f9ebf92-3b1b-4628-b949-44a32e6d3659_0|0 |
|5f9ebf92-3b1b-4628-b949-44a32e6d3659_0|0 |
|5f9ebf92-3b1b-4628-b949-44a32e6d3659_0|0 |
|6b352714-af74-4773-854b-073e644e8684_0|0 |
|6e528e49-472e-48c7-baa9-edc25303e427_0|0 |
|6e528e49-472e-48c7-baa9-edc25303e427_0|0 |
|741630e0-1c99-497d-a127-5c4c562952c5_0|0 |
|03358d8f-9b9c-4258-9c99-234ab102c29b_1|0 |
|040d213c-e91a-42f4-9bf7-90671670dc17_0|0 |
|040d213c-e91a-42f4-9bf7-90671670dc17_0|0 |
|04fe5148-1c56-4c88-aed0-1f01220bffd6_0|0 |
|129eb883-159d-49be-b8ae-9aa44a3e2919_0|0 |
|1855d96d-3647-4c4f-a20f-7e46f7635798_0|0 |
|3c37e066-dff5-4bd9-84ab-b9e73f3f3fdd_0|0 |
|3e998096-3a4b-4b57-a1de-69d2dbd19abd_0|0 |
|3f8ace3c-d378-4423-97a0-3d9cf35ba256_0|0 |
|49a0cfb8-490f-4252-84fa-2b9e250e9333_0|0 |
|4fa035b3-dcd5-40e1-9107-0a0c943ff597_1|0 |
|4fa035b3-dcd5-40e1-9107-0a0c943ff597_1|0 |
|529704d2-5a60-4718-a03f-639e040f6634_0|0 |
|560f6978-028b-4a37-9f97-d97e93976bf7_0|0 |
|560f6978-028b-4a37-9f97-d97e93976bf7_0|0 |
|560f6978-028b-4a37-9f97-d97e93976bf7_0|0 |
|778e3b8a-2ca5-469a-9697-f646962e8308_0|0 |
|8b06ba24-2af3-4eec-811a-4d1779f37876_0|0 |
+--------------------------------------+----------+
2nd Run Sample
+--------------------------------------+----------+
|clientid |Conversion|
+--------------------------------------+----------+
|02438b66-2de4-4765-bae3-de7453647ea7_1|1 |
|7797aba3-3eea-4556-856e-753812b4b551_0|1 |
|870ab2a5-0650-42b8-9e6f-bde3859f64fd_0|1 |
|870ab2a5-0650-42b8-9e6f-bde3859f64fd_1|1 |
|be218b72-c664-40cf-adf5-e3519095e941_0|1 |
|be218b72-c664-40cf-adf5-e3519095e941_0|1 |
|1dfa820c-77e0-4927-8a39-ecd8e842b09b_0|0 |
|252be902-4204-40a5-9d3c-dd3a7d0f0355_0|0 |
|2995b49d-525b-43e9-ab36-8b8910a4607c_0|0 |
|2bc06b59-4624-4ddd-87a3-ed04cba88233_0|0 |
|31563716-9380-4662-90e5-7f63a1ab9072_0|0 |
|5ca3b5bc-35a9-42a5-bd37-a8fc94366dc6_0|0 |
|5d5f2ea0-aed9-4c2d-8c22-68859ec35e8e_0|0 |
|5f9ebf92-3b1b-4628-b949-44a32e6d3659_0|0 |
|5f9ebf92-3b1b-4628-b949-44a32e6d3659_0|0 |
|5f9ebf92-3b1b-4628-b949-44a32e6d3659_0|0 |
|6b352714-af74-4773-854b-073e644e8684_0|0 |
|6e528e49-472e-48c7-baa9-edc25303e427_0|0 |
|6e528e49-472e-48c7-baa9-edc25303e427_0|0 |
|741630e0-1c99-497d-a127-5c4c562952c5_0|0 |
|03358d8f-9b9c-4258-9c99-234ab102c29b_1|0 |
|040d213c-e91a-42f4-9bf7-90671670dc17_0|0 |
|040d213c-e91a-42f4-9bf7-90671670dc17_0|0 |
|04fe5148-1c56-4c88-aed0-1f01220bffd6_0|0 |
|129eb883-159d-49be-b8ae-9aa44a3e2919_0|0 |
|1855d96d-3647-4c4f-a20f-7e46f7635798_0|0 |
|3c37e066-dff5-4bd9-84ab-b9e73f3f3fdd_0|0 |
|3e998096-3a4b-4b57-a1de-69d2dbd19abd_0|0 |
|3f8ace3c-d378-4423-97a0-3d9cf35ba256_0|0 |
|49a0cfb8-490f-4252-84fa-2b9e250e9333_0|0 |
|4fa035b3-dcd5-40e1-9107-0a0c943ff597_1|0 |
|4fa035b3-dcd5-40e1-9107-0a0c943ff597_1|0 |
|529704d2-5a60-4718-a03f-639e040f6634_0|0 |
|560f6978-028b-4a37-9f97-d97e93976bf7_0|0 |
|560f6978-028b-4a37-9f97-d97e93976bf7_0|0 |
|560f6978-028b-4a37-9f97-d97e93976bf7_0|0 |
|778e3b8a-2ca5-469a-9697-f646962e8308_0|0 |
|8b06ba24-2af3-4eec-811a-4d1779f37876_0|0 |
+--------------------------------------+----------+
The two samples being different could be a road blocker for me and just want to check how I could make these consistent

Related

Can I specify the output in Test2 test

Is there a way to edit the following Test2 test:
perl -MTest2::V0 -MRegexp::Common=URI -wE 'is( shift, match $RE{URI}, "should match a URI" )' foo
Such that the CHECK output is less verbose than this:
# Seeded srand with seed '20200813' from local date.
1..1
not ok 1 - should match a URI
# Failed test 'should match a URI'
# at -e line 1.
# +-----+----+-------------------------------------------------------+-----+
# | GOT | OP | CHECK | LNs |
# +-----+----+-------------------------------------------------------+-----+
# | foo | =~ | (?:(?:(?:file)://(?:(?:(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][ | 1 |
# | | | -a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9 | |
# | | | ]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[ | |
# | | | .][0-9]+))|localhost)?)(?:/(?:(?:(?:(?:[-a-zA-Z0-9$_. | |
# | | | +!*'(),:#&=]|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:/(?:(?: | |
# | | | [-a-zA-Z0-9$_.+!*'(),:#&=]|(?:%[a-fA-F0-9][a-fA-F0-9] | |
# | | | ))*))*)))))|(?:(?:telnet)://(?:(?:(?:(?:(?:[-a-zA-Z0- | |
# | | | 9$_.+!*'(),;?&=]|(?:%[a-fA-F0-9][a-fA-F0-9]))*))(?::( | |
# | | | ?:(?:(?:[-a-zA-Z0-9$_.+!*'(),;?&=]|(?:%[a-fA-F0-9][a- | |
# | | | fA-F0-9]))*)))?)#)?(?:(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][- | |
# | | | a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9] | |
# | | | *[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[. | |
# | | | ][0-9]+)))(?::(?:(?:[0-9]+)))?)(?:/)?)|(?:(?:https?): | |
# | | | //(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z | |
# | | | 0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z | |
# | | | ])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(? | |
# | | | :(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\-_.!~* | |
# | | | '():#&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?: | |
# | | | [a-zA-Z0-9\-_.!~*'():#&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0 | |
# | | | -9]))*))*)(?:/(?:(?:(?:[a-zA-Z0-9\-_.!~*'():#&=+$,]+| | |
# | | | (?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_ | |
# | | | .!~*'():#&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))*) | |
# | | | )(?:[?](?:(?:(?:[;/?:#&=+$,a-zA-Z0-9\-_.!~*'()]+|(?:% | |
# | | | [a-fA-F0-9][a-fA-F0-9]))*)))?))?)|(?:(?:gopher)://(?: | |
# | | | (?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9]) | |
# | | | [.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|( | |
# | | | ?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9] | |
# | | | +)))?/(?:(?:(?:[0-9+IgT]))(?:(?:(?:[-a-zA-Z0-9$_.+!*' | |
# | | | (),:#&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))))|(?:(?:pop | |
# | | | )://(?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),&=~]+|(?:%[a-fA- | |
# | | | F0-9][a-fA-F0-9]))+))(?:;AUTH=(?:[*]|(?:(?:(?:[-a-zA- | |
# | | | Z0-9$_.+!*'(),&=~]+|(?:%[a-fA-F0-9][a-fA-F0-9]))+)|(? | |
# | | | :[+](?:APOP|(?:(?:[-a-zA-Z0-9$_.+!*'(),&=~]+|(?:%[a-f | |
# | | | A-F0-9][a-fA-F0-9]))+))))))?#)?(?:(?:(?:(?:(?:(?:[a-z | |
# | | | A-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a | |
# | | | -zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[. | |
# | | | ][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?)|(?:(?:prospe | |
# | | | ro)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a- | |
# | | | zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a- | |
# | | | zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(? | |
# | | | :(?:[0-9]+)))?/(?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),?:#&= | |
# | | | ]|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:/(?:(?:[-a-zA-Z0-9 | |
# | | | $_.+!*'(),?:#&=]|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*))(? | |
# | | | :(?:;(?:(?:[-a-zA-Z0-9$_.+!*'(),?:#&]|(?:%[a-fA-F0-9] | |
# | | | [a-fA-F0-9]))*)=(?:(?:[-a-zA-Z0-9$_.+!*'(),?:#&]|(?:% | |
# | | | [a-fA-F0-9][a-fA-F0-9]))*))*))|(?:(?:wais)://(?:(?:(? | |
# | | | :(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])* | |
# | | | (?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0- | |
# | | | 9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))? | |
# | | | /(?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),]|(?:%[a-fA-F0-9][a | |
# | | | -fA-F0-9]))*))(?:[?](?:(?:(?:[-a-zA-Z0-9$_.+!*'(),;:# | |
# | | | &=]|(?:%[a-fA-F0-9][a-fA-F0-9]))*))|/(?:(?:(?:[-a-zA- | |
# | | | Z0-9$_.+!*'(),]|(?:%[a-fA-F0-9][a-fA-F0-9]))*))/(?:(? | |
# | | | :(?:[-a-zA-Z0-9$_.+!*'(),]|(?:%[a-fA-F0-9][a-fA-F0-9] | |
# | | | ))*)))?))|(?:(?:tel):(?:(?:(?:[+](?:[0-9\-.()]+)(?:;i | |
# | | | sub=[0-9\-.()]+)?(?:;postd=[0-9\-.()*#ABCDwp]+)?(?:(? | |
# | | | :;(?:phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\ | |
# | | | -.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%( | |
# | | | ?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa- | |
# | | | f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_ | |
# | | | a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa- | |
# | | | f]|7[0-9A-Ea-e])))*)))|(?:;(?:tsp)=(?: |(?:(?:(?:[A-Z | |
# | | | a-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?)(?:[. | |
# | | | ](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9] | |
# | | | )?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7AB | |
# | | | DEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[ | |
# | | | 0-9ACEace]))*)(?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%( | |
# | | | ?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9 | |
# | | | A-Fa-f]|7[0-9ACEace]))*)(?:[?](?:(?:[!'*\-.0-9A-Z_a-z | |
# | | | ~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef] | |
# | | | |6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?:%22(?:(?:%5C(?:[ | |
# | | | a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])))|[a | |
# | | | -zA-Z0-9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A | |
# | | | -Fa-f]|[3-9A-Fa-f][a-fA-F0-9]))*%22)))?))*)|(?:[0-9\- | |
# | | | .()*#ABCDwp]+(?:;isub=[0-9\-.()]+)?(?:;postd=[0-9\-.( | |
# | | | )*#ABCDwp]+)?(?:;(?:phone-context)=(?:(?:(?:[+][0-9\- | |
# | | | .()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e- | |
# | | | oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa- | |
# | | | f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[ | |
# | | | !'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f] | |
# | | | |[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))(?:(?:;(?:phone- | |
# | | | context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDw | |
# | | | p]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CF | |
# | | | cf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A- | |
# | | | Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:% | |
# | | | (?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-E | |
# | | | a-e])))*)))|(?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:( | |
# | | | ?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?)(?:[.](?:[A-Za-z | |
# | | | ](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?))*))))|( | |
# | | | ?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0 | |
# | | | -9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]) | |
# | | | )*)(?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABD | |
# | | | Eabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0 | |
# | | | -9ACEace]))*)(?:[?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[1 | |
# | | | 3-7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa- | |
# | | | f]|7[0-9ACEace]))*))?)|(?:%22(?:(?:%5C(?:[a-zA-Z0-9\- | |
# | | | _.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])))|[a-zA-Z0-9\-_ | |
# | | | .!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa-f]|[3-9 | |
# | | | A-Fa-f][a-fA-F0-9]))*%22)))?))*))))|(?:(?:ftp)://(?:( | |
# | | | ?:(?:(?:[a-zA-Z0-9\-_.!~*'();:&=+$,]+|(?:%[a-fA-F0-9] | |
# | | | [a-fA-F0-9]))*))(?:)#)?(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][ | |
# | | | -a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9 | |
# | | | ]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0- | |
# | | | 9]+[.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(? | |
# | | | :[a-zA-Z0-9\-_.!~*'():#&=+$,]+|(?:%[a-fA-F0-9][a-fA-F | |
# | | | 0-9]))*)(?:/(?:(?:[a-zA-Z0-9\-_.!~*'():#&=+$,]+|(?:%[ | |
# | | | a-fA-F0-9][a-fA-F0-9]))*))*))(?:;type=(?:[AIai]))?))? | |
# | | | )|(?:(?:tv):(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)? | |
# | | | [a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]| | |
# | | | [a-zA-Z])[.]?))?)|(?:(?:news):(?:(?:[*]|(?:(?:[-a-zA- | |
# | | | Z0-9$_.+!*'(),;/?:&=]|(?:%[a-fA-F0-9][a-fA-F0-9]))+#( | |
# | | | ?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[ | |
# | | | .])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(? | |
# | | | :[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))|(?:[a-zA-Z][-A- | |
# | | | Za-z0-9.+_]*))))|(?:(?:fax):(?:(?:(?:[+](?:[0-9\-.()] | |
# | | | +)(?:;isub=[0-9\-.()]+)?(?:;tsub=[0-9\-.()]+)?(?:;pos | |
# | | | td=[0-9\-.()*#ABCDwp]+)?(?:(?:;(?:phone-context)=(?:( | |
# | | | ?:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[ | |
# | | | !'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac- | |
# | | | f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689 | |
# | | | A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa- | |
# | | | f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))|( | |
# | | | ?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9 | |
# | | | ]+)){0,61}[A-Za-z0-9])?)(?:[.](?:[A-Za-z](?:(?:(?:[-A | |
# | | | -Za-z0-9]+)){0,61}[A-Za-z0-9])?))*))))|(?:;(?:(?:[!'* | |
# | | | \-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1-9A-Fa | |
# | | | -f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:=(?:(?: | |
# | | | (?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9] | |
# | | | |4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*) | |
# | | | (?:[?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]| | |
# | | | 3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEac | |
# | | | e]))*))?)|(?:%22(?:(?:%5C(?:[a-zA-Z0-9\-_.!~*'()]|(?: | |
# | | | %[a-fA-F0-9][a-fA-F0-9])))|[a-zA-Z0-9\-_.!~*'()]+|(?: | |
# | | | %(?:[01][a-fA-F0-9])|2[013-9A-Fa-f]|[3-9A-Fa-f][a-fA- | |
# | | | F0-9]))*%22)))?))*)|(?:[0-9\-.()*#ABCDwp]+(?:;isub=[0 | |
# | | | -9\-.()]+)?(?:;tsub=[0-9\-.()]+)?(?:;postd=[0-9\-.()* | |
# | | | #ABCDwp]+)?(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.( | |
# | | | )]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq | |
# | | | -vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f] | |
# | | | |5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!' | |
# | | | ()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[ | |
# | | | 4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))(?:(?:;(?:phone-co | |
# | | | ntext)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp] | |
# | | | +))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf | |
# | | | ]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa | |
# | | | -f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(? | |
# | | | :2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea- | |
# | | | e])))*)))|(?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?: | |
# | | | [-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?)(?:[.](?:[A-Za-z]( | |
# | | | ?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9])?))*))))|(?: | |
# | | | ;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9 | |
# | | | ]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))* | |
# | | | )(?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEa | |
# | | | bde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9 | |
# | | | ACEace]))*)(?:[?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13- | |
# | | | 7ABDEabde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f] | |
# | | | |7[0-9ACEace]))*))?)|(?:%22(?:(?:%5C(?:[a-zA-Z0-9\-_. | |
# | | | !~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])))|[a-zA-Z0-9\-_.! | |
# | | | ~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa-f]|[3-9A- | |
# | | | Fa-f][a-fA-F0-9]))*%22)))?))*))))|(?:(?:nntp)://(?:(? | |
# | | | :(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0 | |
# | | | -9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z] | |
# | | | ))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[ | |
# | | | 0-9]+)))?)/(?:(?:[a-zA-Z][-A-Za-z0-9.+_]*))(?:/(?:[0- | |
# | | | 9]+))?))) | |
# +-----+----+-------------------------------------------------------+-----+
Can I do anything to make the output look something like:
# +-----+----+-------------------------------------------------------+-----+
# | GOT | OP | CHECK | LNs |
# +-----+----+-------------------------------------------------------+-----+
# | foo | =~ | $RE{URI} | 1 |
# +-----+----+-------------------------------------------------------+-----+
Seems like it is not possible to format the check message. Looking at the source:
Test2::Tools::Compare::is() calls
Test2::Compare::Delta::diag() which calls
Test2::Compare::Delta::table() to format the output.
You may try ask for a new feature at the GitHub issue tracker.

Spark Dataframe Union giving duplicates

I have a base dataset, and one of the columns is having null and not null values.
so I do:
val nonTrained_ds = base_ds.filter(col("col_name").isNull)
val trained_ds = base_ds.filter(col("col_name").isNotNull)
When I print that out, I get clear separate of rows. But when I do,
val combined_ds = nonTrained_ds.union(trained_ds)
I get duplicate records of rows from nonTrained_ds, and the strange thing is, rows from trained_ds are no longer in the combined ds.
Why does this happen?
the values of trained_ds are:
+----------+----------------+
|unique_no | running_id|
+----------+----------------+
|0456700001|16 |
|0456700004|16 |
|0456700007|16 |
|0456700010|16 |
|0456700013|16 |
|0456700016|16 |
|0456700019|16 |
|0456700022|16 |
|0456700025|16 |
|0456700028|16 |
|0456700031|16 |
|0456700034|16 |
|0456700037|16 |
|0456700040|16 |
|0456700043|16 |
|0456700046|16 |
|0456700049|16 |
|0456700052|16 |
|0456700055|16 |
|0456700058|16 |
|0456700061|16 |
|0456700064|16 |
|0456700067|16 |
|0456700070|16 |
+----------+----------------+
the values of nonTrained_ds are:
+----------+----------------+
|unique_no | running_id|
+----------+----------------+
|0456700002|null |
|0456700003|null |
|0456700005|null |
|0456700006|null |
|0456700008|null |
|0456700009|null |
|0456700011|null |
|0456700012|null |
|0456700014|null |
|0456700015|null |
|0456700017|null |
|0456700018|null |
|0456700020|null |
|0456700021|null |
|0456700023|null |
|0456700024|null |
|0456700026|null |
|0456700027|null |
|0456700029|null |
|0456700030|null |
|0456700032|null |
|0456700033|null |
|0456700035|null |
|0456700036|null |
|0456700038|null |
|0456700039|null |
|0456700041|null |
|0456700042|null |
|0456700044|null |
|0456700045|null |
|0456700047|null |
|0456700048|null |
|0456700050|null |
|0456700051|null |
|0456700053|null |
|0456700054|null |
|0456700056|null |
|0456700057|null |
|0456700059|null |
|0456700060|null |
|0456700062|null |
|0456700063|null |
|0456700065|null |
|0456700066|null |
|0456700068|null |
|0456700069|null |
|0456700071|null |
|0456700072|null |
+----------+----------------+
the values of the combined ds are:
+----------+----------------+
|unique_no | running_id|
+----------+----------------+
|0456700002|null |
|0456700003|null |
|0456700005|null |
|0456700006|null |
|0456700008|null |
|0456700009|null |
|0456700011|null |
|0456700012|null |
|0456700014|null |
|0456700015|null |
|0456700017|null |
|0456700018|null |
|0456700020|null |
|0456700021|null |
|0456700023|null |
|0456700024|null |
|0456700026|null |
|0456700027|null |
|0456700029|null |
|0456700030|null |
|0456700032|null |
|0456700033|null |
|0456700035|null |
|0456700036|null |
|0456700038|null |
|0456700039|null |
|0456700041|null |
|0456700042|null |
|0456700044|null |
|0456700045|null |
|0456700047|null |
|0456700048|null |
|0456700050|null |
|0456700051|null |
|0456700053|null |
|0456700054|null |
|0456700056|null |
|0456700057|null |
|0456700059|null |
|0456700060|null |
|0456700062|null |
|0456700063|null |
|0456700065|null |
|0456700066|null |
|0456700068|null |
|0456700069|null |
|0456700071|null |
|0456700072|null |
|0456700002|16 |
|0456700005|16 |
|0456700008|16 |
|0456700011|16 |
|0456700014|16 |
|0456700017|16 |
|0456700020|16 |
|0456700023|16 |
|0456700026|16 |
|0456700029|16 |
|0456700032|16 |
|0456700035|16 |
|0456700038|16 |
|0456700041|16 |
|0456700044|16 |
|0456700047|16 |
|0456700050|16 |
|0456700053|16 |
|0456700056|16 |
|0456700059|16 |
|0456700062|16 |
|0456700065|16 |
|0456700068|16 |
|0456700071|16 |
+----------+----------------+
This did the trick,
val nonTrained_ds = base_ds.filter(col("primary_offer_id").isNull).distinct()
val trained_ds = base_ds.filter(col("primary_offer_id").isNotNull).distinct()

How to list Rackspace servers filtered by metadata using REST API?

I can see that it is possible to add metadata to a Rackspace virtual machine instance.
I want to get a list of running instances, filtered by a particular metatag value.
I can't see how to do so in the documentation however.
is it possible?
You should be able to do so using the openstack client... but it depends on which metatag you're interested in.
You can get a list of all servers:
openstack server list
Will spit something like
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
| 97606ae9-7f18-4a3c-903a-1583d446119b | trysmallwin | ERROR | |
| cb78b8d5-2f03-4a3f-ab26-f389acbd0b76 | Win-try again | ERROR | public=2607:f298:5:101d:f816:3eff:fe9e:5cd4, 208.113.133.90, 2607:f298:5:101d:f816:3eff:fe36:da45, |
| | | | 208.113.133.93, 2607:f298:5:101d:f816:3eff:fe40:57d5, 208.113.133.95 |
| 040751d1-c4c5-47aa-8dec-1d69a468be1c | hnxhdkwskrvwvdwr | ACTIVE | public=2607:f298:5:101d:f816:3eff:fe60:324, 208.113.130.52 |
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
note the ID of the server and investigate deeper:
openstack server show 040751d1-c4c5-47aa-8dec-1d69a468be1c
+--------------------------------------+------------------------------------------------------------+
| Field | Value |
+--------------------------------------+------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | iad-2 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2016-07-26T17:32:01.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | public=2607:f298:5:101d:f816:3eff:fe60:324, 208.113.130.52 |
| config_drive | True |
| created | 2016-07-26T17:31:51Z |
| flavor | gp1.semisonic (50) |
| hostId | e1efd75d1e8f6a7f5bb228a35db13647281996087d39c65af8ce83d9 |
| id | 040751d1-c4c5-47aa-8dec-1d69a468be1c |
| image | Ubuntu-14.04 (03f89ff2-d66e-49f5-ae61-656a006bbbe9) |
| key_name | stef |
| name | hnxhdkwskrvwvdwr |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| project_id | d2fb6996496044158cf977c2129c8660 |
| properties | |
| security_groups | [{u'name': u'default'}] |
| status | ACTIVE |
| updated | 2016-07-26T17:32:01Z |
| user_id | 5b2ca246f39a425f9a833460bf322603 |
+--------------------------------------+------------------------------------------------------------+
openstack --f json will output the same stuff but in json format that you can more easily manipulate programmatically.
HTH

Emacs Orgmode table $> References does not work

GNU Emacs 24.4.1 org-mode
Here is an org-mode table
#+TBLNAME: revenue
| / | < | | < | | < | | | | | | | | | | | |
| Product | Year_SUM | Month_SUM | Platform | Platform_SUM | adwo | AdMob | adChina | adSage | appfigures | appdriver | coco | Domob | Dianru | Limei | guohead | youmi |
| | | | | | | | | | | | | | | | | |
|---------+----------+-----------+----------+------------------+------+-------+---------+--------+------------+-----------+------+-------+--------+-------+---------+-------|
| Jan | | | iOS | #ERROR | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| | | | Android | =vsum($6..$>);NE | | 1 | | 1 | | 1 | | 1 | | 1 | | 1 |
|---------+----------+-----------+----------+------------------+------+-------+---------+--------+------------+-----------+------+-------+--------+-------+---------+-------|
| | | | | | | | | | | | | | | | | |
#+TBLFM: $5=vsum($6..$>);NE
As you see ,the formula $5=vsum($6..$>);NE can't be calculated! Here is debug info:
Substitution history of formula
Orig: vsum($6..$>)
$xyz-> vsum($6..$>)
#r$c-> vsum($6..$>)
$1-> vsum((0)..$>)
--------^
Error: Expected `)'
But if I replace the formula with $5=vsum($6..$17) and then it works ,I can't figure out where is the problem?
I need some help ,appreciate it!

Per-table constants are not being substituted

Here is my table
| | Name | Vert | Horz | Area | Cost | USD |
|---+-------------+---------+-------+--------+------+-----|
| $ | $price = 75 | $Hi=2.9 | | | | |
| # | Kitchen | 4.160 | 3.630 | #ERROR | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
#+TBLFM: $5=$4*$Hi*2+$3*$Hi*2
Here is trace output:
Substitution history of formula
Orig: $4*$Hi*2+$3*$Hi*2
$xyz-> $4*(#UNDEFINED_NAME)*2+$3*(#UNDEFINED_NAME)*2
#r$c-> $4*(#UNDEFINED_NAME)*2+$3*(#UNDEFINED_NAME)*2
$1-> (3.630)*(#UNDEFINED_NAME)*2+(4.160)*(#UNDEFINED_NAME)*2
---------^
Error: #'s not allowed in this context
What's wrong? Why $Hi was not substituted?
Ah, found it by myself. Here is wrong row, and the fixed row:
| $ | $price = 75 | $Hi=2.9 | | | | |
| $ | price = 75 | Hi=2.9 | | | | |