How to generate new variable from stored results: r(mean)? - special-characters

I am trying to generate a new variable using the stored result r(mean) from the command sum.
I have a continuous variable 'age'.
So,
sum age
g age0=age–r(mean)
The problem is that this gives the error
unknown function age–r()
r(133);.

Works for me:
. sysuse auto, clear
(1978 Automobile Data)
. su mpg
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
mpg | 74 21.2973 5.785503 12 41
. gen mpg0 = mpg - r(mean)
. su mpg0
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
mpg0 | 74 -4.03e-08 5.785503 -9.297297 19.7027
Watch for strange characters, i.e. that the minus sign really is a hyphen [sic].

Related

Substring function to extract part of the string

data = {'desc': ['ADRIAN PETER - ANN 80020355787C - 11 Baillon Pass.pdf', 'AILEEN MARCUS - ANC 800E15432922 - 5 Mandarin Way.pdf',
'AJITH SINGH - ANN 80020837750 - 11 Berkeley Loop.pdf', 'ALEX MARTIN-CURTIS - ANC 80021710355 - 26 Dovedale St.pdf',
'Alice.Smith\Jodee - Karen - ANE 80020428377 - 58 Harrisdale Dr.pdf']}
df = pd.DataFrame(data, columns = ['desc'])
df
From the data frame, I want to create a new column called ID, and in that ID, I want to have only those values starting after ANN, ANC or ANE. So I am expecting a result as below.
ID
80020355787C
800E15432922
80020837750
80021710355
80020428377
I tried running the code below, but it did not get the desired result. Appreciate your help on this.
df['id'] = df['desc'].str.extract(r'\-([^|]+)\-')
You can use - AN[NCE] (800[0-9A-Z]+) -, where:
AN[NCE] matches literally AN followed by N or C or E;
800[0-9A-Z]+ matches literally 800 followed by one or more characters between 0 and 9 or between A and Z.
>>> df['desc'].str.extract(r'- AN[NCE] (800[0-9A-Z]+) -')
0
0 80020355787C
1 800E15432922
2 80020837750
3 80021710355
4 80020428377
If not all your ids start with "800", you can just remove it from the pattern.

Encoding Spotify URI to Spotify Codes

Spotify Codes are little barcodes that allow you to share songs, artists, users, playlists, etc.
They encode information in the different heights of the "bars". There are 8 discrete heights that the 23 bars can be, which means 8^23 different possible barcodes.
Spotify generates barcodes based on their URI schema. This URI spotify:playlist:37i9dQZF1DXcBWIGoYBM5M gets mapped to this barcode:
The URI has a lot more information (62^22) in it than the code. How would you map the URI to the barcode? It seems like you can't simply encode the URI directly. For more background, see my "answer" to this question: https://stackoverflow.com/a/62120952/10703868
The patent explains the general process, this is what I have found.
This is a more recent patent
When using the Spotify code generator the website makes a request to https://scannables.scdn.co/uri/plain/[format]/[background-color-in-hex]/[code-color-in-text]/[size]/[spotify-URI].
Using Burp Suite, when scanning a code through Spotify the app sends a request to Spotify's API: https://spclient.wg.spotify.com/scannable-id/id/[CODE]?format=json where [CODE] is the media reference that you were looking for. This request can be made through python but only with the [TOKEN] that was generated through the app as this is the only way to get the correct scope. The app token expires in about half an hour.
import requests
head={
"X-Client-Id": "58bd3c95768941ea9eb4350aaa033eb3",
"Accept-Encoding": "gzip, deflate",
"Connection": "close",
"App-Platform": "iOS",
"Accept": "*/*",
"User-Agent": "Spotify/8.5.68 iOS/13.4 (iPhone9,3)",
"Accept-Language": "en",
"Authorization": "Bearer [TOKEN]",
"Spotify-App-Version": "8.5.68"}
response = requests.get('https://spclient.wg.spotify.com:443/scannable-id/id/26560102031?format=json', headers=head)
print(response)
print(response.json())
Which returns:
<Response [200]>
{'target': 'spotify:playlist:37i9dQZF1DXcBWIGoYBM5M'}
So 26560102031 is the media reference for your playlist.
The patent states that the code is first detected and then possibly converted into 63 bits using a Gray table. For example 361354354471425226605 is encoded into 010 101 001 010 111 110 010 111 110 110 100 001 110 011 111 011 011 101 101 000 111.
However the code sent to the API is 6875667268, I'm unsure how the media reference is generated but this is the number used in the lookup table.
The reference contains the integers 0-9 compared to the gray table of 0-7 implying that an algorithm using normal binary has been used. The patent talks about using a convolutional code and then the Viterbi algorithm for error correction, so this may be the output from that. Something that is impossible to recreate whithout the states I believe. However I'd be interested if you can interpret the patent any better.
This media reference is 10 digits however others have 11 or 12.
Here are two more examples of the raw distances, the gray table binary and then the media reference:
1.
022673352171662032460
000 011 011 101 100 010 010 111 011 001 100 001 101 101 011 000 010 011 110 101 000
67775490487
2.
574146602473467556050
111 100 110 001 110 101 101 000 011 110 100 010 110 101 100 111 111 101 000 111 000
57639171874
edit:
Some extra info:
There are some posts online describing how you can encode any text such as spotify:playlist:HelloWorld into a code however this no longer works.
I also discovered through the proxy that you can use the domain to fetch the album art of a track above the code. This suggests a closer integration of Spotify's API and this scannables url than previously thought. As it not only stores the URIs and their codes but can also validate URIs and return updated album art.
https://scannables.scdn.co/uri/800/spotify%3Atrack%3A0J8oh5MAMyUPRIgflnjwmB
Your suspicion was correct - they're using a lookup table. For all of the fun technical details, the relevant patent is available here: https://data.epo.org/publication-server/rest/v1.0/publication-dates/20190220/patents/EP3444755NWA1/document.pdf
Very interesting discussion. Always been attracted to barcodes so I had to take a look. I did some analysis of the barcodes alone (didn't access the API for the media refs) and think I have the basic encoding process figured out. However, based on the two examples above, I'm not convinced I have the mapping from media ref to 37-bit vector correct (i.e. it works in case 2 but not case 1). At any rate, if you have a few more pairs, that last part should be simple to work out. Let me know.
For those who want to figure this out, don't read the spoilers below!
It turns out that the basic process outlined in the patent is correct, but lacking in details. I'll summarize below using the example above. I actually analyzed this in reverse which is why I think the code description is basically correct except for step (1), i.e. I generated 45 barcodes and all of them matched had this code.
1. Map the media reference as integer to 37 bit vector.
Something like write number in base 2, with lowest significant bit
on the left and zero-padding on right if necessary.
57639171874 -> 0100010011101111111100011101011010110
2. Calculate CRC-8-CCITT, i.e. generator x^8 + x^2 + x + 1
The following steps are needed to calculate the 8 CRC bits:
Pad with 3 bits on the right:
01000100 11101111 11110001 11010110 10110000
Reverse bytes:
00100010 11110111 10001111 01101011 00001101
Calculate CRC as normal (highest order degree on the left):
-> 11001100
Reverse CRC:
-> 00110011
Invert check:
-> 11001100
Finally append to step 1 result:
01000100 11101111 11110001 11010110 10110110 01100
3. Convolutionally encode the 45 bits using the common generator
polynomials (1011011, 1111001) in binary with puncture pattern
110110 (or 101, 110 on each stream). The result of step 2 is
encoded using tail-biting, meaning we begin the shift register
in the state of the last 6 bits of the 45 long input vector.
Prepend stream with last 6 bits of data:
001100 01000100 11101111 11110001 11010110 10110110 01100
Encode using first generator:
(a) 100011100111110100110011110100000010001001011
Encode using 2nd generator:
(b) 110011100010110110110100101101011100110011011
Interleave bits (abab...):
11010000111111000010111011110011010011110001...
1010111001110001000101011000010110000111001111
Puncture every third bit:
111000111100101111101110111001011100110000100100011100110011
4. Permute data by choosing indices 0, 7, 14, 21, 28, 35, 42, 49,
56, 3, 10..., i.e. incrementing 7 modulo 60. (Note: unpermute by
incrementing 43 mod 60).
The encoded sequence after permuting is
111100110001110101101000011110010110101100111111101000111000
5. The final step is to map back to bar lengths 0 to 7 using the
gray map (000,001,011,010,110,111,101,100). This gives the 20 bar
encoding. As noted before, add three bars: short one on each end
and a long one in the middle.
UPDATE: I've added a barcode (levels) decoder (assuming no errors) and an alternate encoder that follows the description above rather than the equivalent linear algebra method. Hopefully that is a bit more clear.
UPDATE 2: Got rid of most of the hard-coded arrays to illustrate how they are generated.
The linear algebra method defines the linear transformation (spotify_generator) and mask to map the 37 bit input into the 60 bit convolutionally encoded data. The mask is result of the 8-bit inverted CRC being convolutionally encoded. The spotify_generator is a 37x60 matrix that implements the product of generators for the CRC (a 37x45 matrix) and convolutional codes (a 45x60 matrix). You can create the generator matrix from an encoding function by applying the function to each row of an appropriate size generator matrix. For example, a CRC function that add 8 bits to each 37 bit data vector applied to each row of a 37x37 identity matrix.
import numpy as np
import crccheck
# Utils for conversion between int, array of binary
# and array of bytes (as ints)
def int_to_bin(num, length, endian):
if endian == 'l':
return [num >> i & 1 for i in range(0, length)]
elif endian == 'b':
return [num >> i & 1 for i in range(length-1, -1, -1)]
def bin_to_int(bin,length):
return int("".join([str(bin[i]) for i in range(length-1,-1,-1)]),2)
def bin_to_bytes(bin, length):
b = bin[0:length] + [0] * (-length % 8)
return [(b[i]<<7) + (b[i+1]<<6) + (b[i+2]<<5) + (b[i+3]<<4) +
(b[i+4]<<3) + (b[i+5]<<2) + (b[i+6]<<1) + b[i+7] for i in range(0,len(b),8)]
# Return the circular right shift of an array by 'n' positions
def shift_right(arr, n):
return arr[-n % len(arr):len(arr):] + arr[0:-n % len(arr)]
gray_code = [0,1,3,2,7,6,4,5]
gray_code_inv = [[0,0,0],[0,0,1],[0,1,1],[0,1,0],
[1,1,0],[1,1,1],[1,0,1],[1,0,0]]
# CRC using Rocksoft model:
# NOTE: this is not quite any of their predefined CRC's
# 8: number of check bits (degree of poly)
# 0x7: representation of poly without high term (x^8+x^2+x+1)
# 0x0: initial fill of register
# True: byte reverse data
# True: byte reverse check
# 0xff: Mask check (i.e. invert)
spotify_crc = crccheck.crc.Crc(8, 0x7, 0x0, True, True, 0xff)
def calc_spotify_crc(bin37):
bytes = bin_to_bytes(bin37, 37)
return int_to_bin(spotify_crc.calc(bytes), 8, 'b')
def check_spotify_crc(bin45):
data = bin_to_bytes(bin45,37)
return spotify_crc.calc(data) == bin_to_bytes(bin45[37:], 8)[0]
# Simple convolutional encoder
def encode_cc(dat):
gen1 = [1,0,1,1,0,1,1]
gen2 = [1,1,1,1,0,0,1]
punct = [1,1,0]
dat_pad = dat[-6:] + dat # 6 bits are needed to initialize
# register for tail-biting
stream1 = np.convolve(dat_pad, gen1, mode='valid') % 2
stream2 = np.convolve(dat_pad, gen2, mode='valid') % 2
enc = [val for pair in zip(stream1, stream2) for val in pair]
return [enc[i] for i in range(len(enc)) if punct[i % 3]]
# To create a generator matrix for a code, we encode each row
# of the identity matrix. Note that the CRC is not quite linear
# because of the check mask so we apply the lamda function to
# invert it. Given a 37 bit media reference we can encode by
# ref * spotify_generator + spotify_mask (mod 2)
_i37 = np.identity(37, dtype=bool)
crc_generator = [_i37[r].tolist() +
list(map(lambda x : 1-x, calc_spotify_crc(_i37[r].tolist())))
for r in range(37)]
spotify_generator = 1*np.array([encode_cc(crc_generator[r]) for r in range(37)], dtype=bool)
del _i37
spotify_mask = 1*np.array(encode_cc(37*[0] + 8*[1]), dtype=bool)
# The following matrix is used to "invert" the convolutional code.
# In particular, we choose a 45 vector basis for the columns of the
# generator matrix (by deleting those in positions equal to 2 mod 4)
# and then inverting the matrix. By selecting the corresponding 45
# elements of the convolutionally encoded vector and multiplying
# on the right by this matrix, we get back to the unencoded data,
# assuming there are no errors.
# Note: numpy does not invert binary matrices, i.e. GF(2), so we
# hard code the following 3 row vectors to generate the matrix.
conv_gen = [[0,1,0,1,1,1,1,0,1,1,0,0,0,1]+31*[0],
[1,0,1,0,1,0,1,0,0,0,1,1,1] + 32*[0],
[0,0,1,0,1,1,1,1,1,1,0,0,1] + 32*[0] ]
conv_generator_inv = 1*np.array([shift_right(conv_gen[(s-27) % 3],s) for s in range(27,72)], dtype=bool)
# Given an integer media reference, returns list of 20 barcode levels
def spotify_bar_code(ref):
bin37 = np.array([int_to_bin(ref, 37, 'l')], dtype=bool)
enc = (np.add(1*np.dot(bin37, spotify_generator), spotify_mask) % 2).flatten()
perm = [enc[7*i % 60] for i in range(60)]
return [gray_code[4*perm[i]+2*perm[i+1]+perm[i+2]] for i in range(0,len(perm),3)]
# Equivalent function but using CRC and CC encoders.
def spotify_bar_code2(ref):
bin37 = int_to_bin(ref, 37, 'l')
enc_crc = bin37 + calc_spotify_crc(bin37)
enc_cc = encode_cc(enc_crc)
perm = [enc_cc[7*i % 60] for i in range(60)]
return [gray_code[4*perm[i]+2*perm[i+1]+perm[i+2]] for i in range(0,len(perm),3)]
# Given 20 (clean) barcode levels, returns media reference
def spotify_bar_decode(levels):
level_bits = np.array([gray_code_inv[levels[i]] for i in range(20)], dtype=bool).flatten()
conv_bits = [level_bits[43*i % 60] for i in range(60)]
cols = [i for i in range(60) if i % 4 != 2] # columns to invert
conv_bits45 = np.array([conv_bits[c] for c in cols], dtype=bool)
bin45 = (1*np.dot(conv_bits45, conv_generator_inv) % 2).tolist()
if check_spotify_crc(bin45):
return bin_to_int(bin45, 37)
else:
print('Error in levels; Use real decoder!!!')
return -1
And example:
>>> levels = [5,7,4,1,4,6,6,0,2,4,3,4,6,7,5,5,6,0,5,0]
>>> spotify_bar_decode(levels)
57639171874
>>> spotify_barcode(57639171874)
[5, 7, 4, 1, 4, 6, 6, 0, 2, 4, 3, 4, 6, 7, 5, 5, 6, 0, 5, 0]

How to extract data set from a text file?

Am quite new in the Unix field and I am currently trying to extract data set from a text file. I tried with sed, grep, awk but it seems to only work with extracting lines, but I want to extract an entire dataset... Here is an example of file from which I'd like to extract the 2 data sets (figures after the lines "R.Time Intensity")
[Header]
Application Name LabSolutions
Version 5.87
Data File Name C:\LabSolutions\Data\Antoine\170921_AC_FluoSpectra\069_WT3a derivatized lignin LiCl 430_GPC_FOREVER_430_049.lcd
Output Date 2017-10-12
Output Time 12:07:32
[Configuration]
Instrument Name BOTAN127-Instrument1
Instrument # 1
Line # 1
# of Detectors 3
Detector ID Detector A Detector B PDA
Detector Name Detector A Detector B PDA
# of Channels 1 1 2
[LC Chromatogram(Detector A-Ch1)]
Interval(msec) 500
# of Points 9603
Start Time(min) 0,000
End Time(min) 80,017
Intensity Units mV
Intensity Multiplier 0,001
Ex. Wavelength(nm) 405
Em. Wavelength(nm) 430
R.Time (min) Intensity
0,00000 -709779
0,00833 -709779
0,01667 17
0,02500 3
0,03333 7
0,04167 19
0,05000 9
0,05833 5
0,06667 2
0,07500 24
0,08333 48
[LC Chromatogram(Detector B-Ch1)]
Interval(msec) 500
# of Points 9603
Start Time(min) 0,000
End Time(min) 80,017
Intensity Units mV
Intensity Multiplier 0,001
R.Time (min) Intensity
0,00000 149
0,00833 149
0,01667 -1
I would greatly appreciate any idea. Thanks in advance.
Antoine
awk '/^[^0-9]/&&d{d=0} /R.Time/{d=1}d' file
Brief explanation,
Set d as a flag to determine print line or not
/^[^0-9]/&&d{d=0}: if regex ^[^0-9] matched && d==1, disabled d
/R.Time/{d=1}: if string "R.Time" searched, enabled d
awk '/R.Time/,/LC/' file|grep -v -E "R.Time|LC"
grep part will remove the R.Time and LC lines that come as a part of the output from awk
I think it's a job for sed.
sed '/R.Time/!d;:A;N;/\n$/!bA' infile

how to use estat vif in the right way

I have 2 questions concerning estat vif to test multicollinearity:
Is it correct that you can only calculate estat vif after the regress command?
If I execute this command Stata only gives me the vif of one independent variable.
How do I get the vif of all the independent variables?
Q1. I find estat vif documented under regress postestimation. If you can find it documented under any other postestimation heading, then it is applicable after that command.
Q2. You don't give any examples, reproducible or otherwise, of your problem. But estat vif by default gives a result for each predictor (independent variable).
. sysuse auto, clear
(1978 Automobile Data)
. regress mpg weight price
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 66.85
Model | 1595.93249 2 797.966246 Prob > F = 0.0000
Residual | 847.526967 71 11.9369995 R-squared = 0.6531
-------------+---------------------------------- Adj R-squared = 0.6434
Total | 2443.45946 73 33.4720474 Root MSE = 3.455
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | -.0058175 .0006175 -9.42 0.000 -.0070489 -.0045862
price | -.0000935 .0001627 -0.57 0.567 -.000418 .0002309
_cons | 39.43966 1.621563 24.32 0.000 36.20635 42.67296
------------------------------------------------------------------------------
. estat vif
Variable | VIF 1/VIF
-------------+----------------------
price | 1.41 0.709898
weight | 1.41 0.709898
-------------+----------------------
Mean VIF | 1.41

How to calculate a Mod b in Casio fx-991ES calculator

Does anyone know how to calculate a Mod b in Casio fx-991ES Calculator. Thanks
This calculator does not have any modulo function. However there is quite simple way how to compute modulo using display mode ab/c (instead of traditional d/c).
How to switch display mode to ab/c:
Go to settings (Shift + Mode).
Press arrow down (to view more settings).
Select ab/c (number 1).
Now do your calculation (in comp mode), like 50 / 3 and you will see 16 2/3, thus, mod is 2. Or try 54 / 7 which is 7 5/7 (mod is 5).
If you don't see any fraction then the mod is 0 like 50 / 5 = 10 (mod is 0).
The remainder fraction is shown in reduced form, so 60 / 8 will result in 7 1/2. Remainder is 1/2 which is 4/8 so mod is 4.
EDIT:
As #lawal correctly pointed out, this method is a little bit tricky for negative numbers because the sign of the result would be negative.
For example -121 / 26 = -4 17/26, thus, mod is -17 which is +9 in mod 26. Alternatively you can add the modulo base to the computation for negative numbers: -121 / 26 + 26 = 21 9/26 (mod is 9).
EDIT2: As #simpatico pointed out, this method will not work for numbers that are out of calculator's precision. If you want to compute say 200^5 mod 391 then some tricks from algebra are needed. For example, using rule
(A * B) mod C = ((A mod C) * B) mod C we can write:
200^5 mod 391 = (200^3 * 200^2) mod 391 = ((200^3 mod 391) * 200^2) mod 391 = 98
As far as I know, that calculator does not offer mod functions.
You can however computer it by hand in a fairly straightforward manner.
Ex.
(1)50 mod 3
(2)50/3 = 16.66666667
(3)16.66666667 - 16 = 0.66666667
(4)0.66666667 * 3 = 2
Therefore 50 mod 3 = 2
Things to Note:
On line 3, we got the "minus 16" by looking at the result from line (2) and ignoring everything after the decimal. The 3 in line (4) is the same 3 from line (1).
Hope that Helped.
Edit
As a result of some trials you may get x.99991 which you will then round up to the number x+1.
You need 10 ÷R 3 = 1
This will display both the reminder and the quoitent
÷R
There is a switch a^b/c
If you want to calculate
491 mod 12
then enter 491 press a^b/c then enter 12. Then you will get 40, 11, 12. Here the middle one will be the answer that is 11.
Similarly if you want to calculate 41 mod 12 then find 41 a^b/c 12. You will get 3, 5, 12 and the answer is 5 (the middle one). The mod is always the middle value.
You can calculate A mod B (for positive numbers) using this:
Pol( -Rec( 1/2πr , 2πr × A/B ) , Y ) ( πr - Y ) B
Then press [CALC], and enter your values for A and B, and any value for Y.
/ indicates using the fraction key, and r means radians ( [SHIFT] [Ans] [2] )
type normal division first and then type shift + S->d
Here's how I usually do it. For example, to calculate 1717 mod 2:
Take 1717 / 2. The answer is 858.5
Now take 858 and multiply it by the mod (2) to get 1716
Finally, subtract the original number (1717) minus the number you got from the previous step (1716) -- 1717-1716=1.
So 1717 mod 2 is 1.
To sum this up all you have to do is multiply the numbers before the decimal point with the mod then subtract it from the original number.
Note: Math error means a mod m = 0
It all falls back to the definition of modulus: It is the remainder, for example, 7 mod 3 = 1.
This because 7 = 3(2) + 1, in which 1 is the remainder.
To do this process on a simple calculator do the following:
Take the dividend (7) and divide by the divisor (3), note the answer and discard all the decimals -> example 7/3 = 2.3333333, only worry about the 2. Now multiply this number by the divisor (3) and subtract the resulting number from the original dividend.
so 2*3 = 6, and 7 - 6 = 1, thus 1 is 7mod3
Calculate x/y (your actual numbers here), and press a b/c key, which is 3rd one below Shift key.
Simply just divide the numbers, it gives yuh the decimal format and even the numerical format. using S<->D
For example: 11/3 gives you 3.666667 and 3 2/3 (Swap using S<->D).
Here the '2' from 2/3 is your mod value.
Similarly 18/6 gives you 14.833333 and 14 5/6 (Swap using S<->D).
Here the '5' from 5/6 is your mod value.