date in pig latin - date

I am trying to do the following. I have multiple dates and I want to create a pig script which gets unknown number of input dates and then runs the pig script for the input arguments. My question is:
How can I send an unknown number of input variables to a pig script and then handle them within the pig script?
Thanks
Sara

I have some trouble understanding what you actually want to do. That would be my solution >for your problem, sending an unknown number of dates (sorted as chararray):
A = load 'input_dates' AS (date:chararray);
B = my_macro(A);
It's quite basic, so I guess I didn't understand your problem correctly. Could you maybe >develop a little bit more your problem?
UPDATE >> How about something like this if you use Pig 0.11 (there is a bug until 0.10 for module imports):
#!/usr/bin/python
import os
from org.apache.pig.scripting import *
P = Pig.compile("""
data = LOAD '$docs_in' AS (a:int);
-- do something
""")
lof = os.listdir("/home/.../dates/")
params = []
for elem in lof:
params.append({'docs_in': str(elem)})
lof.remove(elem)
bound = P.bind(list_of_files)
stats = bound.run(params)
If each run is counting on the result of the previous one, use runSingle() instead.

If I understand question correctly, you want to load number of files or directories. You can specify as "," as input.
Below is an example:
load.pig (content):
A = LOAD '$input' using PigStorage();
dump A;
command to run ( to run locally):
pig -x local -param input=20120301,20120302,20120304 load.pig

Related

How to insert similar value into multiple locations of a psycopg2 query statement using dict? [duplicate]

I have a Python script that runs a pgSQL file through SQLAlchemy's connection.execute function. Here's the block of code in Python:
results = pg_conn.execute(sql_cmd, beg_date = datetime.date(2015,4,1), end_date = datetime.date(2015,4,30))
And here's one of the areas where the variable gets inputted in my SQL:
WHERE
( dv.date >= %(beg_date)s AND
dv.date <= %(end_date)s)
When I run this, I get a cryptic python error:
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) argument formats can't be mixed
…followed by a huge dump of the offending SQL query. I've run this exact code with the same variable convention before. Why isn't it working this time?
I encountered a similar issue as Nikhil. I have a query with LIKE clauses which worked until I modified it to include a bind variable, at which point I received the following error:
DatabaseError: Execution failed on sql '...': argument formats can't be mixed
The solution is not to give up on the LIKE clause. That would be pretty crazy if psycopg2 simply didn't permit LIKE clauses. Rather, we can escape the literal % with %%. For example, the following query:
SELECT *
FROM people
WHERE start_date > %(beg_date)s
AND name LIKE 'John%';
would need to be modified to:
SELECT *
FROM people
WHERE start_date > %(beg_date)s
AND name LIKE 'John%%';
More details in the pscopg2 docs: http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
As it turned out, I had used a SQL LIKE operator in the new SQL query, and the % operand was messing with Python's escaping capability. For instance:
dv.device LIKE 'iPhone%' or
dv.device LIKE '%Phone'
Another answer offered a way to un-escape and re-escape, which I felt would add unnecessary complexity to otherwise simple code. Instead, I used pgSQL's ability to handle regex to modify the SQL query itself. This changed the above portion of the query to:
dv.device ~ E'iPhone.*' or
dv.device ~ E'.*Phone$'
So for others: you may need to change your LIKE operators to regex '~' to get it to work. Just remember that it'll be WAY slower for large queries. (More info here.)
For me it's turn out I have % in sql comment
/* Any future change in the testing size will not require
a change here... even if we do a 100% test
*/
This works fine:
/* Any future change in the testing size will not require
a change here... even if we do a 100pct test
*/

General purpose Tuple in Hadoop

I'm new to Hadoop, so please do not judge strictly my seemingly simple question.
The short version: What tuple data type can I use in Hadoop, to store 2 longs as a single value is a sequence file?
Moreover, I want to be able to read and process this file with Apache Pig like A = LOAD '/my/file' AS (a:long, (b:long, c:long)) and with Scala & Spark like val a = sc.sequenceFile[LongWritable, DesiredTuple]("/my/file", 1).
The full story:
I'm writing a Hadoop Job in Java, and I need to output a sequence file, which contains 3 long values at each line. I use first value a a key and group two other values together as a value in my Reducer.
I tried several variants:
Using org.apache.hadoop.mapreduce.lib.join.TupleWritable
public class MyReducer extends Reducer<...> {
public void reduce(Context context){
long a,b,c;
// ...
context.write(a, new TupleWritable(
new LongWritable[]{new LongWritable(b), new LongWritable(c)}));
}
}
But the javadoc of TupleWritable class says " * This is not a general-purpose tuple type." It seems to be ok for first attempt, but I can't get back my Tuples. Look as a simple script in Apace Pig:
A = LOAD '/my/file' USING org.apache.pig.piggybank.storage.SequenceFileLoader()
AS (a:long, (b:long, t:long));
DUMP A;
I got Something like this:
(2220,)
(5640,)
(6240,)
...
So what is the Apache Pig way of reading Hadoop's TupleWritable from a sequence file?
Furthermore, I tried to change sequence format to text format: job.setOutputFormatClass(TextOutputFormat.class);
This time I just looked in one of outputed files:
> hdfs dfs -cat /my/file/part-r-00000 | head
2220 [,]
5640 [,]
6240 [,]
...
So is the next question: Why there is nothing in my TupleWritable value?
After that, I tried org.apache.mahout.cf.taste.hadoop.EntityEntityWritable.
For a sequence file I got the same result as before:
grunt> A = LOAD '/my/file' USING org.apache.pig.piggybank.storage.SequenceFileLoader() AS (a:long, (b:long, c:long));
(2220,)
(5640,)
(6240,)
...
For a text file I got the desired result:
2220 2 15
5640 1 9
6240 0 1
...
And next question is: How to read such tuples (EntityEntityWritable) and may be other custom objects back from Hadoop-written sequence file?

For loop to open files in Python

I am relatively new to Python and need to run a python macro through Abaqus. I am opening files e.g "nonsym1, nonsym2, nonsym3". I'm trying to do this with a loop. The code opens nonsym1 (in abaqus) and performs some operations on it, then is supposed loop back and do the same to the other files. Here is the code I'm trying...
for i in range (1,10):
filename = 'nonsym(i)'
step = mdb.openStep(
'C:/Users/12345678/Documents/Inventor/Aortic Dissection/%s.stp' %filename,
scaleFromFile=OFF)
My main issue is coming from the fact that the %s in the directory I think?...
error message when trying to run this macro Don't know how to best approach this, so any help would be great thanks! Still learning!
Instead of using filename=nonsym1-2-3-..., name the step files as integers 1.stp,2.stp,3.stp and then convert integers to the string values with %str(i)...
And use the code below:
for i in range (1,10):
step = mdb.openStep(
'C:/Users/12345678/Documents/Inventor/Aortic Dissection/%s.stp' %str(i), scaleFromFile=OFF)
To obtain equal quantity of odb files, modify the Job code line as similiar as this code.

MATLAB missorting structure array when using dir command

I have a bunch of Excel data, called "1.xls", "2.xls"... until "15.xls", each with 141x44 sets of data. I am using the dir function to import the data into MATLAB.
Here I am importing the first and second columns from each file into A and B matrix.
prob15 = dir(fullfile('C:\Users\Bo Sun\Documents\MATLAB\prob15'),'.xls');
global A B
A=zeros(141,length(prob15));
B=zeros(141,length(prob15));
for i=1:length(prob15)
A(:,i) = xlsread(prob15(i).name,'A:A');
B(:,i) = xlsread(prob15(i).name,'B:B');
end
My problem is, when I use the dir command, for some reason MATLAB missorts the data, in that the ascending order of the prob15 structure array will be "1.xls", "10.xls", "11.xls"... instead of normal ascending numerical order ("1.xls", "2.xls, ...). Anyone know how I could fix this? Thanks.
The order you are seeing is called ascii-betical order and is the normal sorting order for all kinds of utilities, and evidently your OS directory listing program as well, since matlab just farms this command out to the OS.
If you want a numerical sort, you can convert the filename strings to numbers and sort those. Before I wrote it myself some light googling yielded this which you can easily adapt to your problem:
list = dir(fullfile(cd, '*.mat'));
name = {list.name};
str = sprintf('%s#', name{:});
num = sscanf(str, 'r_%d.mat#');
[dummy, index] = sort(num);
name = name(index);

Preparing a command with Structured Parameters

I have this ADO.NET command object and I can set some parameters and execute it successfully.
_mergecommand.Parameters.Add(new SqlParameter("values", SqlDbType.Structured));
_mergecommand.Parameters["values"].TypeName = "strlist";
_mergecommand.Parameters["values"].Direction = ParameterDirection.Input;
_mergecommand.Parameters["values"].Value = valuelist;
_mergecommand.ExecuteNonQuery();
This works fine. But I want to prepare this command before executing it because I need to run this millions of times. I am using SQL Server 2008. I get this error if I try to prepare it
SqlCommand.Prepare method requires all variable length parameters to have an explicitly set non-zero Size.
Any idea how to do this?
This is old, but there does appear to be a correct answer which is to use -1 as the size, e.g.:
_mergecommand.Parameters.Add(new SqlParameter("values", SqlDbType.Structured, -1));
If you have to do it millions of times using a command like this is probably not a good strategy.
Can you serialize your data into an XML string and pass that as a single argument? That will be considerably less load on your network and SQL Server.... although it will probably hit your client a lot harder.
If you are dead set on doing it that way, maybe what you are looking for is an overload of the SqlCommand.Parameters.Add method:
_mergecommand.Parameters.Add("#values", System.Data.SqlDbType.NVarChar, 100).Value = foo;
is that more like what you wanted?