How do I parse datetime in KDB Q? - kdb

I have minutely data:
t o h l c v
------------------------------------------------------
2016-01-04T09:00:00Z 105.45 105.45 103.6 103.6 17462
2016-01-04T09:03:00Z 103.7 103.99 103.7 103.99 893
2016-01-04T09:06:00Z 103.7 103.7 103.7 103.7 335
Which I've read in with:
f: `:/home/chris/sync/us_equities/AAPL.csv
show flip `t`o`h`l`c`v!("SFFFFI";",")0: f
I'm trying to work out how to parse the ISO8601 timestamp into something KDB understands. How should I do it?
This is my first time using q.

You can drop the last character (-1_) from each value on the right (/:) and then parse ($) to timestamp
f: `:/home/chris/sync/us_equities/AAPL.csv
tab:flip `t`o`h`l`c`v!("*FFFFI";",")0: f
update "P"$-1_/:t from tab
Note that * should be used for generic text data rather than S
https://code.kx.com/q/ref/tok/#unix-timestamps
https://code.kx.com/q/ref/maps/#each-left-and-each-right
https://code.kx.com/q/ref/drop/
https://code.kx.com/q/basics/datatypes/#strings

If you're using kdb v4.0 or greater you can parse it directly as type "P":
q)("PFFFFI";1#",")0:f
t o h l c v
--------------------------------------------------------------
2016.01.04D09:00:00.000000000 105.45 105.45 103.6 103.6 17462
2016.01.04D09:03:00.000000000 103.7 103.99 103.7 103.99 893
2016.01.04D09:06:00.000000000 103.7 103.7 103.7 103.7 335
For lower kdb versions you have to do as rianoc suggested.

Related

Haskell: How to pretty print number of seconds as a date and time?

I have an integral value that is the number of seconds since the Epoch. I can output it as a big integer, but I want to show it as a human-readable date and time.
For example:
secToTimestamp :: Int32 -> [Char]
which returns something like:
2016-01-01 14:11:11
In the interests of having a simple time based solution (since time is the defacto module for manipulating anything time related):
import Data.Time.Clock.POSIX
import Data.Time.Format
secToTimestamp :: Int32 -> String
secToTimestamp = formatTime defaultTimeLocale "%F %X" . posixSecondsToUTCTime . fromIntegral
Possible use unix-time module
{-# LANGUAGE OverloadedStrings #-}
import Prelude
import qualified Data.ByteString.Char8 as B
import Data.UnixTime
import Data.Int
import Data.Functor
secToTimestampGMT :: Int32 -> [Char]
secToTimestampGMT t = B.unpack $ formatUnixTimeGMT "%Y-%m-%d %H-%M-%S" $ UnixTime (fromIntegral t) 0
secToTimestamp :: Int32 -> IO [Char]
secToTimestamp t = B.unpack <$> (formatUnixTime "%Y-%m-%d %H-%M-%S" $ UnixTime (fromIntegral t) 0)

Compare contrasts in linear model in Python (like Rs contrast library?)

In R I can do the following to compare two contrasts from a linear model:
url <- "https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/spider_wolff_gorb_2013.csv"
filename <- "spider_wolff_gorb_2013.csv"
install.packages("downloader", repos="http://cran.us.r-project.org")
library(downloader)
if (!file.exists(filename)) download(url, filename)
spider <- read.csv(filename, skip=1)
head(spider, 5)
# leg type friction
# 1 L1 pull 0.90
# 2 L1 pull 0.91
# 3 L1 pull 0.86
# 4 L1 pull 0.85
# 5 L1 pull 0.80
fit = lm(friction ~ type + leg, data=spider)
fit
# Call:
# lm(formula = friction ~ type + leg, data = spider)
#
# Coefficients:
# (Intercept) typepush legL2 legL3 legL4
# 1.0539 -0.7790 0.1719 0.1605 0.2813
install.packages("contrast", repos="http://cran.us.r-project.org")
library(contrast)
l4vsl2 = contrast(fit, list(leg="L4", type="pull"), list(leg="L2",type="pull"))
l4vsl2
# lm model parameter contrast
#
# Contrast S.E. Lower Upper t df Pr(>|t|)
# 0.1094167 0.04462392 0.02157158 0.1972618 2.45 277 0.0148
I have found out how to do much of the above in Python:
import pandas as pd
df = pd.read_table("https://raw.githubusercontent.com/genomicsclass/dagdata/master/inst/extdata/spider_wolff_gorb_2013.csv", sep=",", skiprows=1)
df.head(2)
import statsmodels.formula.api as sm
model1 = sm.ols(formula='friction ~ type + leg', data=df)
fitted1 = model1.fit()
print(fitted1.summary())
Now all that remains is finding the t-statistic for the contrast of leg pair L4 vs. leg pair L2. Is this possible in Python?
statsmodels is still missing some predefined contrasts, but the t_test and wald_test or f_test methods of the model Results classes can be used to test linear (or affine) restrictions. The restrictions either be given by arrays or by strings using the parameter names.
Details for how to specify contrasts/restrictions should be in the documentation
for example
>>> tt = fitted1.t_test("leg[T.L4] - leg[T.L2]")
>>> print(tt.summary())
Test for Constraints
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
c0 0.1094 0.045 2.452 0.015 0.022 0.197
==============================================================================
The results are attributes or methods in the instance that is returned by t_test. For example the conf_int can be obtained by
>>> tt.conf_int()
array([[ 0.02157158, 0.19726175]])
t_test is vectorized and treats each restriction or contrast as separate hypothesis. wald_test treats a list of restrictions as joint hypothesis:
>>> tt = fitted1.t_test(["leg[T.L3] - leg[T.L2], leg[T.L4] - leg[T.L2]"])
>>> print(tt.summary())
Test for Constraints
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
c0 -0.0114 0.043 -0.265 0.792 -0.096 0.074
c1 0.1094 0.045 2.452 0.015 0.022 0.197
==============================================================================
>>> tt = fitted1.wald_test(["leg[T.L3] - leg[T.L2], leg[T.L4] - leg[T.L2]"])
>>> print(tt.summary())
<F test: F=array([[ 8.10128575]]), p=0.00038081249480917173, df_denom=277, df_num=2>
Aside: this also works for robust covariance matrices if cov_type was specified as argument to fit.

pg_dump from 9.1.7 to 9.1.11

I would like to import a table from another postgress database (9.1.7) to mine which is (9.1.11). I tried to import the dump but I get a bunch of syntax errors, I'm assuming there is some issue with the version mismatch?
Is there a better solution other than downgrading my postgress installation to match the desired input file?
This is the command I used to export the database on the 9.1.7 system:
pg_dump superdb -U tester -a -t guidedata > /tmp/guidedata.sql
This is the command I used to import the dump file guidedata.sql
psql linuxdb -U tester -h localhost < guidedata.sql
This is the top portion of the database dump file which I am attempting to import:
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET search_path = public, pg_catalog;
--
-- Data for Name: epg; Type: TABLE DATA; Schema: public; Owner: spy
--
COPY epg (id, channel, sdate, stime, duration, stitle, ltitle, theme, sdesc, ldesc, mpaa, rating, stereo, surround, sap, closedcaptioned, animated, blackwhite, rerun, live, ismovie, nudity, language, violence, adulttheme, halfstars, field1) FROM stdin;
90056520 AMC 01092014 0100 270 Titanic Titanic 8,15 A woman falls for an artist aboard the ill-fated ship. Leonardo DiCaprio, Kate Winslet (1997) A society girl abandons her haughty fiance for a penniless artist on the ill-fated ship's maiden voyage. (3:15) MPAAPG13 TVPG f f f t f f t f t t t t t 8 f
90056521 AMC 01092014 0530 180 Love Actually Love Actually 23,15 Various people deal with relationships in London. Hugh Grant, Laura Linney (2003) A prime minister, an office worker, a pop star, a jilted writer, married couples and various others deal with relationships in London. (2:15) MPAAR TVPG f f t t f f t f t t t f t 6 f
90056522 AMC 01092014 0830 150 Four Four Weddings and a Funeral 23,15 An English charmer meets a lusty American. Hugh Grant, Andie MacDowell (1994) An English charmer and a lusty American make love over a course of surprising events. (1:56) MPAAR TV14 f f t t f f t f t f t f t 7 f
90056523 AMC 01092014 1100 30 Paid Prog. Paid Programming 0 Paid programming. Paid programming. f f f f f f t f f f f f f 0 f
90056524 AMC 01092014 1130 30 Williams Montel Williams 19 Living well with Montel and the effects of identity theft. Living well with Montel and the devastating effects of identity theft. f f f f f f t f f f f f f 0 f
90056525 AMC 01092014 1200 30 Cindy Crawford Cindy Crawford Reveals Secret to Ageless Beauty 19 Cindy Crawford's skin secret with Meaningful Beauty. Cindy Crawford's supermodel secret to youthful, radiant-looking skin with Meaningful Beauty. f f f f f f t f f f f f f 0 f
90056526 AMC 01092014 1230 30 More Sex More Sex, Less Stress 19 Androzene promotes male sexual health & nourishes the body. Androzene promotes male sexual health and nourishes the body. f f f f f f t f f f f f f 0 f
90056527 AMC 01092014 1300 30 WEN Hair Care WEN by Chaz Dean Revolutionary Hair Care System 19 WEN by Chaz Dean is revolutionary hair care. WEN by Chaz Dean is revolutionary hair care that cleans and conditions without many shampoo's harsh detergents or sulfates. Natural ingredients help make hair shinier, fuller, softer and more manageable! By trusted GuthyRenker. f f f f f f t f f f f f f 0 f
90056528 AMC 01092014 1330 30 Medicare Looking for a Medicare plan? Tune in now! 19 Watch and learn about Humana Medicare Advantage plans. Watch and learn about Humana Medicare Advantage plans. f f f f f f t f f f f f f 0 f
90056529 AMC 01092014 1400 5 Stooges The Three Stooges 6 The caveman boys meet cavewomen. Moe Howard, Larry Fine ''I'm a Monkey's Unc
Here is some of the error output I see on the console when attempting to import:
ERROR: syntax error at or near "Route"
LINE 1: Route 66 renovation gives Ron a change of heart. TVPG t f f...
^
ERROR: syntax error at or near "Ron"
LINE 1: Ron and Jason bring out the Pontiac GTO. TVPG t f f t f f t...
^
ERROR: syntax error at or near "repairing"
LINE 1: repairing the clutch and drive shaft on the 1995 BMW. TVPG ...
^
ERROR: syntax error at or near "classic"
LINE 1: classic Bucik;
Thanks
My guess - one system is linux/unix/mac and the other is Windows.
It's complaining about the first line of the block of data because there is a stray carriage-return (\r) character.
Solution: use the (recommended unless you have a good reason not to) "custom" format with -Fc or --format=custom on your dump command. That should sort it.

Erlang identify umlauts

How can I identify german umlauts in Erlang? I tried for days now, when I read a text as list it just doesn't get them. I tried this for example
change_umlaut(Word) -> change_umlaut(lists:reverse(Word), []).
change_umlaut([],Acc) -> Acc;
change_umlaut([H|T],Acc) ->
if
%extended ascii characters
H =:= 129 -> change_umlaut(T, ["ue"|Acc]);
H =:= 132 -> change_umlaut(T, ["ae"|Acc]);
H =:= 148 -> change_umlaut(T, ["oe"|Acc]);
%extended ascii characters
H == 129 -> change_umlaut(T, ["ue"|Acc]);
H == 132 -> change_umlaut(T, ["ae"|Acc]);
H == 148 -> change_umlaut(T, ["oe"|Acc]);
%literals
H == "ü" -> change_umlaut(T, ["ue"|Acc]);
H == "ä" -> change_umlaut(T, ["ae"|Acc]);
H == "ö" -> change_umlaut(T, ["oe"|Acc]);
%else
true -> change_umlaut(T, [H|Acc])
end;
it just passes all the arguments without matching until true...
Thank you for your help.
In Erlang, strings usually contain Latin-1 or Unicode codepoints, so you should be looking for 228 for "ä", 246 for "ö" and 252 for "ü".
Your literals section should have made this work transparently, except for the fact that H is a single character, and you're comparing it to strings ("ü", "ä" and "ö"). The corresponding character literals are $ü, $ä and $ö - make sure that your source file is saved as Latin-1 for this to work.

DNA to RNA and Getting Proteins with Perl

I am working on a project(I have to implement it in Perl but I am not good at it) that reads DNA and finds its RNA. Divide that RNA's into triplets to get the equivalent protein name of it. I will explain the steps:
1) Transcribe the following DNA to RNA, then use the genetic code to translate it to a sequence of amino acids
Example:
TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT
2) To transcribe the DNA, first substitute each DNA for it’s counterpart (i.e., G for C, C for G, T for A and A for T):
TCATAATACGTTTTGTATTCGCCAGCGCTTCGGTGT
AGTATTATGCAAAACATAAGCGGTCGCGAAGCCACA
Next, remember that the Thymine (T) bases become a Uracil (U). Hence our sequence becomes:
AGUAUUAUGCAAAACAUAAGCGGUCGCGAAGCCACA
Using the genetic code is like that
AGU AUU AUG CAA AAC AUA AGC GGU CGC GAA GCC ACA
then look each triplet (codon) up in the genetic code table. So AGU becomes Serine, which we can write as Ser, or
just S. AUU becomes Isoleucine (Ile), which we write as I. Carrying on in this way, we get:
SIMQNISGREAT
I will give the protein table:
So how can I write that code in Perl? I will edit my question and write the code that what I did.
Try the script below, it accepts input on STDIN (or in file given as parameter) and read it by line. I also presume, that "STOP" in the image attached is some stop state. Hope I read it all well from that picture.
#!/usr/bin/perl
use strict;
use warnings;
my %proteins = qw/
UUU F UUC F UUA L UUG L UCU S UCC S UCA S UCG S UAU Y UAC Y UGU C UGC C UGG W
CUU L CUC L CUA L CUG L CCU P CCC P CCA P CCG P CAU H CAC H CAA Q CAG Q CGU R CGC R CGA R CGG R
AUU I AUC I AUA I AUG M ACU T ACC T ACA T ACG T AAU N AAC N AAA K AAG K AGU S AGC S AGA R AGG R
GUU V GUC V GUA V GUG V GCU A GCC A GCA A GCG A GAU D GAC D GAA E GAG E GGU G GGC G GGA G GGG G
/;
LINE: while (<>) {
chomp;
y/GCTA/CGAU/; # translate (point 1&2 mixed)
foreach my $protein (/(...)/g) {
if (defined $proteins{$protein}) {
print $proteins{$protein};
}
else {
print "Whoops, stop state?\n";
next LINE;
}
}
print "\n"
}