How to remove scientific notation in pyspark? - pyspark

I'm doing a division between two fields, but scientific notation always appears at the end of the number:
Example:
636.57 / 1031995.85 = 6.168338758338999E-4
my code is:
df.withColumn("value",coalesce(col("value_1"),lit(0))/coalesce(col("value_2"),lit(0)))
when i convert using cast("decimal(18,14)"), the result is:
6.1683386E-4
can anyone help me?
I need the result : 6.168338758338999
I tried something similar but it didn't work either:
How to turn off scientific notation in pyspark?

Related

Phonetic Algorithms for Postgresql

please, I am working on a PoC for Person Real-time Identification, and one of the critical aspects of it is to support both minor misspelling and phonetic variations of First, Middle, and Last name. Like HarinGton == HarrinBton or RaphEAl == RafAEl. It's working for longer names, but it's a bit more imprecise for names like Lee and John.
I am using Double Metaphone through dmetaphone() and dmetaphone_alt() in PostgreSQL 13.3 (Supabase.io). And although I appreciate Double Metaphone it has a (too?) short string as the outcome. metaphone() has parameters to make the resulting phonetic representation longer. I investigated dmetaphone() and couldn't find anything other than the default function.
Is there a way of making dmetaphone() and dmetaphone_alt() return a longer phonetic representation similar to metaphone()'s, but with a ALT variation?.
Any help would be much appreciated.
Thanks
Looking at the postgres docs for these features you don't have parametric control over the length of the encoded string for Double Metaphone. In the case of single Metaphone, you can only truncate the output string:
max_output_length sets the maximum length of the output metaphone code; if longer, the output is truncated to this length.
However you may get much better results by using Trigram Similarity or Levenshtein Distance on the encoded output from either of the metaphone methods - this can be a more powerful way to handle phonetic permutations using Metaphones.
Example
Consider all the spelling permutations possible for the artist Cyndi Lauper, using double metaphone with trigram similarity we can achieve 100% similarity between the incorrect string cindy lorper and the correct spelling:
SELECT similarity(dmetaphone('cindy lorper'), dmetaphone('cyndi lauper'));
yields: similarity real: 1 (ie: 100% similarity)
Which means the encodings are identical for both input strings using Double Metaphone. When using Metaphone, they're slightly different. All of the following yield SNTLRPR
SELECT metaphone('cyndy lorper',10);
SELECT metaphone('sinday lorper', 10);
SELECT metaphone('cinday laurper', 10);
SELECT metaphone('cyndi lauper',10);
yields: SNTLPR which is only one character different to SNTLRPR
You can also use Levenshtein Distance to calculate it, which gives you a filterable parameter to work with:
SELECT levenshtein(metaphone('sinday lorper', 10), metaphone('cyndi lauper', 10));
yields: levenshtein integer: 1
It's working for longer names, but it's a bit more imprecise for names
like Lee and John.
It's a bit difficult to see exactly what you're having trouble with - without a more complete reprex.
SELECT similarity(dmetaphone('lee'), dmetaphone('leigh'));
SELECT similarity(dmetaphone('jon'), dmetaphone('john'));
both yield: similarity real: 1 (ie: 100% similarity)
Edit: here's a easy to follow guide for fuzzy matching with postgres

Integer to scientific notation

I needed to convert a large integer to and between exponential notation (or e notation), and all questions was about converting from exponential notation.
Which is easy, just enter the number as is, ie. 2.57588E13 and powershell will automatically convert it for you.
However, to get it back might not be obvious and had to resort to C# forums to finally find the answer.
Example:
$bignumber = 2.57588E13
Write-Output "Result: $bignumber"
25758800000000
Write-Output $bignumber.tostring("e5")
2.57588e+013
The "e" is the number of decimals that should be visible in the result. Hope this helps someone else.

Specify numeric formats in the infix command

I want to import a delimited text file into Stata. Some of the fields are numeric where the numbers are formatted with commas ( i.e 2,144.20). When I specify a numeric data type in the infix command for these columns, the values will be imputed to missing.
infix 2 first str id 2-15 double amount 16-25 using "{datasetname}"
Is there a way to specify the numeric format (e.g %20.2fc) so that Stata does not treat them as non-numeric? Another way is to import it as string and convert it to numeric later. But I want to see if there is a way to specify the format in the infix command itself.
There is no such syntax. It would not even make sense from a Stata point of view as a format such as %20.2fc is a display format and controls what is shown (output), not what is read in (input).
Use destring, ignore(",") replace to fix such variables after reading them in.

Progress 4GL: Formatting decimals through temp-table declaration

I'm trying to format decimal's to round to the nearest hundredth with a temp-table declaration similar to this.
DEFINE TEMP-TABLE foo
FIELD random-decimal AS DECIMAL FORMAT "->>>,>>>,>>>.99".
The end result is displayed on a report through which I'm using the following to output:
EXPORT STREAM sStream DELIMITER ',' foo.
This does not seem to work as I'm intending it to. I'm still receiving values like this: 0.000073.
Does anyone have any insight to what I'm doing wrong? I was unable to find anything for this specific case anyone online.
FORMAT has no impact on storage. It is only a "hint" for default display and input purposes.
What you want is the "decimals" field attribute:
DEFINE TEMP-TABLE foo
FIELD random-decimal AS DECIMAL decimals 2 FORMAT "->>>,>>>,>>>.99".
create foo.
random-decimal = 1.12345.
display random-decimal format ">.9999".

zip code + 4 mail merge treated like an arithmetic expression

I'm trying to do a simple mail merge in Word 2010 but when I insert an excel field that's supposed to represent a zip code from Connecticut (ie. 06880) I am having 2 problems:
the leading zero gets suppressed such as 06880 becoming 6880 instead. I know that I can at least toggle field code to make it so it works as {MERGEFIELD ZipCode # 00000} and that at least works.
but here's the real problem I can't seem to figure out:
A zip+4 field such as 06470-5530 gets treated like an arithmetic expression. 6470 - 5530 = 940 so by using above formula instead it becomes 00940 which is wrong.
Perhaps is there something in my excel spreadsheet or an option in Word that I need to set to make this properly work? Please advise, thanks.
See macropod's post in this conversation
As long as the ZIP codes are reaching Word (with or without "-" signs in the 5+4 format ZIPs, his field code should sort things out. However, if you are mixing text and numeric formats in your Excel column, there is a danger that the OLE DB provider or ODBC driver - if that is what you are using to get the data - will treat the column as numeric and return all the text values as 0.
Yes, Word sometimes treats text strings as numeric expressions as you have noticed. It will do that when you try to apply a numeric format, or when you try to do a calculation in an { = } field, when you sum table cell contents in an { = } field, or when Word decides to do a numeric comparison in (say) an { IF } field - in the latter case you can get Word to treat the expression as a string by surrounding the comparands by double-quotes.
in Excel, to force the string data type when entering data that looks like a number, a date, a fraction etc. but is not numeric (zip, phone number, etc.) simply type an apostrophe before the data.
=06470 will be interpreted as a the number 6470 but ='06470 will be the string "06470"
The simplest fix I've found is to save the Excel file as CSV. Word takes it all at face value then.