I wrote a macro to loop through and merge several .root files of data collected hourly, in an attempt to take several hourly files and turn it into daily files instead. For some reason it is creating several copies of it and all the information within it. For example, when I try to look into the tree containing the data from all the trees, it says "clusters_Tree; 61".
I am attaching my macro, any idea how I could fix this?
#include "TChain.h"
#include "TTree.h"
#include "TParameter.h"
#include "TFile.h"
#include <iostream>
Double_t elow = 0.13;
Double_t ehigh = 100.;
void makeShort(TString year, TString month, TString day){
TChain* c = new TChain("clusters_tree");
TChain* d = new TChain("finfo");
int nFiles = 0;
double efact = 6.04E-3;
TString infolder = "/data/directory1/";
TString contains = year + month + day;
TString outfolder = "/data/directory1/";
TFile* fout = new
TFile(outfolder+"/short_test"+contains+".root","RECREATE");
TSystemDirectory dir(infolder, infolder);
TList *files = dir.GetListOfFiles();
if (files){
TSystemFile *file;
TString fname;
TIter next(files);
while ((file=(TSystemFile*)next())) {
fname = file->GetName();
if (file->IsDirectory() && fname.Contains(contains)) {
nFiles += c->Add(infolder+fname+"/*.root");
d->Add(infolder+fname+"/*.root");
}
}
cout << "Found " << nFiles << " files" << endl;
}
TTree* details = new TTree("details","details");
details->Branch("nFiles",&nFiles);
details->Branch("conversion",&efact);
TTree* t = c->CloneTree(0);
TParameter<double>* q = NULL;
c->SetBranchAddress("charge_total",&q);
Int_t nentries = c->GetEntries();
for(Int_t i=0; i<nentries; i++){
if(i%100000==0)
std::cout << "Processing cluster " << i << " of " << nentries << std::endl;
c->GetEntry(i);
Double_t e = q->GetVal()*efact;
if(e>elow && e<ehigh)
t->Fill();
}
TTree* f = d->CloneTree();
t->Write();
f->Write();
details->Write();
fout->Close();
}
You should really be using hadd. A default ROOT build should already have the binary.
That said, I see you are essentially filling a new tree. The way to do is to create a TChain, and merge to write back (as done by hadd). The clusters_Tree; 61 that you see, are not exactly copies. These are known as cycles, and are more like versions. I'm guessing you have 61 files (maybe 60)? They are probably because you use TTree::CloneTree(0) instead of TChain::Merge(..).
Related
Basically, I am writing a program that works with large integer values that overflow the cpp integer. I am trying to compute something like: gdc(pow(a, b), c) where a ^ b is the value overflowing the integer limit. Is there a way to do this where I don't have to rely on big integer libraries? If not, are there any recommended big integer libraries?
We can use a property of greatest common divisor that gcd(a, b) = gcd(a % b, b). Hence gcd(pow(a, b), c) = gcd(pow(a, b) % c, c) = gcd(powmod(a, b, c), c), where powmod() is modular exponentiation.
In my C++ code below PowMod() is implemented using exponentiation by squaring approach.
Try it online!
#include <cstdint>
#include <iostream>
using Word = uint32_t;
using DWord = uint64_t;
Word GCD(Word a, Word b) {
Word t = 0;
while (b != 0) {
t = b;
b = a % b;
a = t;
}
return a;
}
Word PowMod(Word a, Word b, Word c) {
Word r = 1;
while (b != 0) {
if (b & 1)
r = (DWord(r) * a) % c;
a = (DWord(a) * a) % c;
b >>= 1;
}
return r;
}
int main() {
Word const
a = 2645680092U, b = 3562429202U, c = 3045001828U,
powmod = PowMod(a, b, c), gcd = GCD(powmod, c);
std::cout << "a = " << a << ", b = " << b
<< ", c = " << c << std::endl;
std::cout << "PowMod(a, b, c) = "
<< powmod << std::endl; // 592284924
std::cout << "GCD(PowMod(a, b, c), c) = "
<< gcd << std::endl; // 1892
}
Output:
a = 2645680092, b = 3562429202, c = 3045001828
PowMod(a, b, c) = 592284924
GCD(PowMod(a, b, c), c) = 1892
which gives correct results, that can be verified through following simple Python program giving same result:
Try it online!
import random, math
random.seed(0)
bits = 32
while True:
c = random.randrange(1 << (bits - 1), 1 << bits)
a = random.randrange(1 << (bits - 1), 1 << bits) % c
b = random.randrange(1 << (bits - 1), 1 << bits)
pm = pow(a, b, c)
gcd = math.gcd(pm, c)
if gcd >= 1000:
print('a =', a, ', b =', b, ', c =', c,
', powmod =', pm, ', gcd =', gcd)
break
Output:
a = 2645680092 , b = 3562429202 , c = 3045001828 ,
powmod = 592284924 , gcd = 1892
If you have GCC/CLang compiler, you can make Word to be 64-bit and DWord to be 128-bit, by changing following lines of code:
using Word = uint64_t;
using DWord = unsigned __int128;
my code supports 32-bit inputs, but after this change you can have 64-bit inputs.
Part 2. Using large integer arithmetics library GMP.
If for some reason you have large input integers then you can use great library GMP for large arithmetics (it supports integer, rational, floating point numbers).
This library has all mathematical operations, including modular exponentiation (PowMod) and some number theoretical functions (including GCD). Also this library is very popular and highly optimized.
In following code I do same things like in me code above, but using only GMP's functions. As an example I use 512-bit integers to show that it can accept large inputs (it can accept even millions of digits):
Try it online!
#include <iostream>
#include <cstdlib>
#include <gmpxx.h>
int main() {
mpz_class const
a("1953143455988359840868749111326065201169739169335107410565117106311318704164104986194255770982854472823807334163384557922525376038346976291413843761504166", 10),
b("5126002245539530470958611905297854592859344951467500786493685495603638740444446597426402800257519403404965463713689509774040138494219032682986554069941558", 10),
c("4396071968291195248321035664209400217968667450140674696924686844534284953565382985421958604880273584922294910355449271193696338132720472184903935323837626", 10);
mpz_class powmod, gcd;
// PowMod
mpz_powm(powmod.get_mpz_t(), a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t()); // 1632164707041502536171492944083090257113212090861915134477312917063125646194834706890409016008321666479437224930114914370387958138698748075752168351835856
// GCD
mpz_gcd(gcd.get_mpz_t(), powmod.get_mpz_t(), c.get_mpz_t()); // 51842
// Output
std::cout << "PowMod = " << powmod.get_str() << std::endl
<< "GCD = " << gcd.get_str() << std::endl;
}
Output:
PowMod = 1632164707041502536171492944083090257113212090861915134477312917063125646194834706890409016008321666479437224930114914370387958138698748075752168351835856
GCD = 51842
To use GMP library under Linux just install sudo apt install libgmp-dev and compile clang++ -std=c++11 -O2 -lgmp -o main main.cpp.
Using GMP under Windows is a bit more tricky. One way is to build yourself MPIR library which is a Windows friendly clone of GMP. Another way is to install MSYS and use prebuilt GMP from there following these instructions that I wrote in my other answer.
I already have intel basekit installed, and eclipse for C / C ++: (eclipse-inst-jre-linux64.tar.gz), but I can't find a way to run a simple example using openmp.
In the terminal I compile my example with:
icpx -fiopenmp -fopenmp-targets = spir64 random_openmp.cpp
but I can't do the same using eclipse.
Please find the example code below:
# include <iostream>
# include <iomanip>
# include <cmath>
# include <ctime>
# include <omp.h>
using namespace std;
int main ( );
void monte_carlo ( int n, int &seed );
double random_value ( int &seed );
void timestamp ( );
/******************************************************************************/
int main ( void )
/******************************************************************************/
/*
Purpose:
MAIN is the main program for RANDOM_OPENMP.
Discussion:
This program simply explores one issue in the generation of random
numbers in a parallel program. If the random number generator uses
an integer seed to determine the next entry, then it is not easy for
a parallel program to reproduce the same exact sequence.
But what is worse is that it might not be clear how the separate
OpenMP threads should handle the SEED value - as a shared or private
variable? It seems clear that each thread should have a private
seed that is initialized to a distinct value at the beginning of
the computation.
Licensing:
This code is distributed under the GNU LGPL license.
Modified:
03 September 2012
Author:
John Burkardt
*/
{
int n;
int seed;
timestamp ( );
cout << "\n";
cout << "RANDOM_OPENMP\n";
cout << " C++ version\n";
cout << " An OpenMP program using random numbers.\n";
cout << " The random numbers depend on a seed.\n";
cout << " We need to insure that each OpenMP thread\n";
cout << " starts with a different seed.\n";
cout << "\n";
cout << " Number of processors available = " << omp_get_num_procs ( ) << "\n";
cout << " Number of threads = " << omp_get_max_threads ( ) << "\n";
n = 100;
seed = 123456789;
monte_carlo ( n, seed );
/*
Terminate.
*/
cout << "\n";
cout << "RANDOM_OPENMP\n";
cout << " Normal end of execution.\n";
cout << "\n";
timestamp ( );
return 0;
}
/******************************************************************************/
void monte_carlo ( int n, int &seed )
/******************************************************************************/
/*
Purpose:
MONTE_CARLO carries out a Monte Carlo calculation with random values.
Licensing:
This code is distributed under the GNU LGPL license.
Modified:
03 September 2012
Author:
John Burkardt
Parameter:
Input, int N, the number of values to generate.
Input, int &SEED, a seed for the random number generator.
*/
{
int i;
int my_id;
int *my_id_vec;
int my_seed;
int *my_seed_vec;
double *x;
x = new double[n];
my_id_vec = new int[n];
my_seed_vec = new int[n];
# pragma omp master
{
cout << "\n";
cout << " Thread Seed I X(I)\n";
cout << "\n";
}
# pragma omp parallel private ( i, my_id, my_seed ) shared ( my_id_vec, my_seed_vec, n, x )
{
my_id = omp_get_thread_num ( );
my_seed = seed + my_id;
cout << " " << setw(6) << my_id
<< " " << setw(12) << my_seed << "\n";
# pragma omp for
for ( i = 0; i < n; i++ )
{
my_id_vec[i] = my_id;
x[i] = random_value ( my_seed );
my_seed_vec[i] = my_seed;
// cout << " " << setw(6) << my_id
// << " " << setw(12) << my_seed
// << " " << setw(6) << i
// << " " << setw(14) << x[i] << "\n";
}
}
//
// C++ OpenMP IO from multiple processors comes out chaotically.
// For this reason only, we'll save the data from the loop and
// print it in the sequential section!
//
for ( i = 0; i < n; i++ )
{
cout << " " << setw(6) << my_id_vec[i]
<< " " << setw(12) << my_seed_vec[i]
<< " " << setw(6) << i
<< " " << setw(14) << x[i] << "\n";
}
delete [] my_id_vec;
delete [] my_seed_vec;
delete [] x;
return;
}
/******************************************************************************/
double random_value ( int &seed )
/******************************************************************************/
/*
Purpose:
RANDOM_VALUE generates a random value R.
Discussion:
This is not a good random number generator. It is a SIMPLE one.
It illustrates a model which works by accepting an integer seed value
as input, performing some simple operation on the seed, and then
producing a "random" real value using some simple transformation.
Licensing:
This code is distributed under the GNU LGPL license.
Modified:
03 September 2012
Author:
John Burkardt
Parameters:
Input/output, int &SEED, a seed for the random
number generator.
Output, double RANDOM_VALUE, the random value.
*/
{
double r;
seed = ( seed % 65536 );
seed = ( ( 3125 * seed ) % 65536 );
r = ( double ) ( seed ) / 65536.0;
return r;
}
//****************************************************************************80
void timestamp ( )
//****************************************************************************80
//
// Purpose:
//
// TIMESTAMP prints the current YMDHMS date as a time stamp.
//
// Example:
//
// 31 May 2001 09:45:54 AM
//
// Modified:
//
// 24 September 2003
//
// Author:
//
// John Burkardt
//
// Parameters:
//
// None
//
{
# define TIME_SIZE 40
static char time_buffer[TIME_SIZE];
const struct tm *tm;
time_t now;
now = time ( NULL );
tm = localtime ( &now );
strftime ( time_buffer, TIME_SIZE, "%d %B %Y %I:%M:%S %p", tm );
cout << time_buffer << "\n";
return;
# undef TIME_SIZE
}
There is an article explaining how to use Intel C++ compiler in Eclipse here:
https://software.intel.com/content/www/us/en/develop/articles/intel-c-compiler-for-linux-using-intel...
, also one more recent documentation on running a sample program in Eclipse here:
https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-oneapi-base-linux/top/run-a-sample-project-using-an-ide.html
and
https://software.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-oneapi-hpc-linux/top/run-a-sample-project-with-eclipse.html
The HPCKit Get Start used the matrix sample. It has an OpenMP version. So you need to launch Eclipse from terminal window where the env is set with "servars.sh".
I am trying to learn how to use cursors in pqxx.
I found pqxx::cursor_base in the reference and there are several subclasses that derive from pqxx::cursor_base.
After Googling the topic for hours, I can't find any sample code or anything explaining how to use pqxx cursors.
Any suggestions?
There are surprisingly few cursor examples to be found. Here's what I use:
const std::conStr("user=" + opt::dbUser + " password=" + opt::dbPasswd + " host=" + opt::dbHost + " dbname=" + opt::dbName);
pqxx::connection conn(connStr);
pqxx::work txn(conn);
std::string selectString = "SELECT id, name FROM table_name WHERE condition";
pqxx::stateless_cursor<pqxx::cursor_base::read_only, pqxx::cursor_base::owned>
cursor(txn, selectString, "myCursor", false);
//cursor variables
size_t idx = 0; //starting location
size_t step = 10000; //number of rows for each chunk
pqxx::result result;
do{
//get next cursor chunk and update the index
result = cursor.retrieve( idx, idx + step );
idx += step;
size_t records = result.size();
cout << idx << ": records pulled = " << records << endl;
for( pqxx::result::const_iterator row : result ){
//iterate over cursor rows
}
}
while( result.size() == step ); //if the result.size() != step, we're on our last loop
cout << "Done!" << endl;
When I create a condition in an overloading method operator+ compiler creates an extra object but in a strange way.
Created 0x73fe30
Created 0x73fe20
Created 0x73fdd0
Deleted 0x73fdd0 / 9
Press any key to continue . . .
Deleted 0x73fe10 / 9
Deleted 0x73fe20 / 4
Deleted 0x73fe30 / 5
How can I eliminate this?
If condition is gone the code will be running all right.
Created 0x73fe30
Created 0x73fe20
Created 0x73fe10
Press any key to continue . . .
Deleted 0x73fe10 / 9
Deleted 0x73fe20 / 4
Deleted 0x73fe30 / 5
The code:
#include <iostream>
using std::cout;
using std::ostream;
using std::endl;
class Numbers
{
friend Numbers operator+(Numbers & a, Numbers & b) {
if (a.value){ // Condition. May be any condition e.g. true
Numbers c(a.value+b.value);
return c;
}
}
public:
int value;
Numbers(){}
Numbers(int value) : value(value) {cout << "Created " << this << endl;}
~Numbers(){
cout << "Deleted " << this << " / " << value << endl;
}
};
int main(){
Numbers a(5);
Numbers b(4);
Numbers c = a+b;
system("pause");
}
Imagine a std:vector, say, with 100 things on it (0 to 99) currently. You are treating it as a loop. So the 105th item is index 4; forward 7 from index 98 is 5.
You want to delete N items after index position P.
So, delete 5 items after index 50; easy.
Or 5 items after index 99: as you delete 0 five times, or 4 through 0, noting that position at 99 will be erased from existence.
Worst, 5 items after index 97 - you have to deal with both modes of deletion.
What's the elegant and solid approach?
Here's a boring routine I wrote
-(void)knotRemovalHelper:(NSMutableArray*)original
after:(NSInteger)nn howManyToDelete:(NSInteger)desired
{
#define ORCO ((NSInteger)[original count])
static NSInteger kount, howManyUntilLoop, howManyExtraAferLoop;
if ( ... our array is NOT a loop ... )
// trivial, if messy...
{
for ( kount = 1; kount<=desired; ++kount )
{
if ( (nn+1) >= ORCO )
return;
[original removeObjectAtIndex:( nn+1 )];
}
return;
}
else // our array is a loop
// messy, confusing and inelegant. how to improve?
// here we go...
{
howManyUntilLoop = (ORCO-1) - nn;
if ( howManyUntilLoop > desired )
{
for ( kount = 1; kount<=desired; ++kount )
[original removeObjectAtIndex:( nn+1 )];
return;
}
howManyExtraAferLoop = desired - howManyUntilLoop;
for ( kount = 1; kount<=howManyUntilLoop; ++kount )
[original removeObjectAtIndex:( nn+1 )];
for ( kount = 1; kount<=howManyExtraAferLoop; ++kount )
[original removeObjectAtIndex:0];
return;
}
#undef ORCO
}
Update!
InVariant's second answer leads to the following excellent solution. "starting with" is much better than "starting after". So the routine now uses "start with". Invariant's second answer leads to this very simple solution...
N times do if P < currentsize remove P else remove 0
-(void)removeLoopilyFrom:(NSMutableArray*)ra
startingWithThisOne:(NSInteger)removeThisOneFirst
howManyToDelete:(NSInteger)countToDelete
{
// exception if removeThisOneFirst > ra highestIndex
// exception if countToDelete is > ra size
// so easy thanks to Invariant:
for ( do this countToDelete times )
{
if ( removeThisOneFirst < [ra count] )
[ra removeObjectAtIndex:removeThisOneFirst];
else
[ra removeObjectAtIndex:0];
}
}
Update!
Toolbox has pointed out the excellent idea of working to a new array - super KISS.
Here's an idea off the top of my head.
First, generate an array of integers representing the indices to remove. So "remove 5 from index 97" would generate [97,98,99,0,1]. This can be done with the application of a simple modulus operator.
Then, sort this array descending giving [99,98,97,1,0] and then remove the entries in that order.
Should work in all cases.
This solution seems to work, and it copies all remaining elements in the vector only once (to their final destination).
Assume kNumElements, kStartIndex, and kNumToRemove are defined as const size_t values.
vector<int> my_vec(kNumElements);
for (size_t i = 0; i < my_vec.size(); ++i) {
my_vec[i] = i;
}
for (size_t i = 0, cur = 0; i < my_vec.size(); ++i) {
// What is the "distance" from the current index to the start, taking
// into account the wrapping behavior?
size_t distance = (i + kNumElements - kStartIndex) % kNumElements;
// If it's not one of the ones to remove, then we keep it by copying it
// into its proper place.
if (distance >= kNumToRemove) {
my_vec[cur++] = my_vec[i];
}
}
my_vec.resize(kNumElements - kNumToRemove);
There's nothing wrong with two loop solutions as long as they're readable and don't do anything redundant. I don't know Objective-C syntax, but here's the pseudocode approach I'd take:
endIdx = after + howManyToDelete
if (Len <= after + howManyToDelete) //will have a second loop
firstloop = Len - after; //handle end in the first loop, beginning in second
else
firstpass = howManyToDelete; //the first loop will get them all
for (kount = 0; kount < firstpass; kount++)
remove after+1
for ( ; kount < howManyToDelete; kount++) //if firstpass < howManyToDelete, clean up leftovers
remove 0
This solution doesn't use mod, does the limit calculation outside the loop, and touches the relevant samples once each. The second for loop won't execute if all the samples were handled in the first loop.
The common way to do this in DSP is with a circular buffer. This is just a fixed length buffer with two associated counters:
//make sure BUFSIZE is a power of 2 for quick mod trick
#define BUFSIZE 1024
int CircBuf[BUFSIZE];
int InCtr, OutCtr;
void PutData(int *Buf, int count) {
int srcCtr;
int destCtr = InCtr & (BUFSIZE - 1); // if BUFSIZE is a power of 2, equivalent to and faster than destCtr = InCtr % BUFSIZE
for (srcCtr = 0; (srcCtr < count) && (destCtr < BUFSIZE); srcCtr++, destCtr++)
CircBuf[destCtr] = Buf[srcCtr];
for (destCtr = 0; srcCtr < count; srcCtr++, destCtr++)
CircBuf[destCtr] = Buf[srcCtr];
InCtr += count;
}
void GetData(int *Buf, int count) {
int srcCtr = OutCtr & (BUFSIZE - 1);
int destCtr = 0;
for (destCtr = 0; (srcCtr < BUFSIZE) && (destCtr < count); srcCtr++, destCtr++)
Buf[destCtr] = CircBuf[srcCtr];
for (srcCtr = 0; srcCtr < count; srcCtr++, destCtr++)
Buf[destCtr] = CircBuf[srcCtr];
OutCtr += count;
}
int BufferOverflow() {
return ((InCtr - OutCtr) > BUFSIZE);
}
This is pretty lightweight, but effective. And aside from the ctr = BigCtr & (SIZE-1) stuff, I'd argue it's highly readable. The only reason for the & trick is in old DSP environments, mod was an expensive operation so for something that ran often, like every time a buffer was ready for processing, you'd find ways to remove stuff like that. And if you were doing FFT's, your buffers were probably a power of 2 anyway.
These days, of course, you have 1 GHz processors and magically resizing arrays. You kids get off my lawn.
Another method:
N times do {remove entry at index P mod max(ArraySize, P)}
Example:
N=5, P=97, ArraySize=100
1: max(100, 97)=100 so remove at 97%100 = 97
2: max(99, 97)=99 so remove at 97%99 = 97 // array size is now 99
3: max(98, 97)=98 so remove at 97%98 = 97
4: max(97, 97)=97 so remove at 97%97 = 0
5: max(96, 97)=97 so remove at 97%97 = 0
I don't program iphone for know, so I image std::vector, it's quite easy, simple and elegant enough:
#include <iostream>
using std::cout;
#include <vector>
using std::vector;
#include <cassert> //no need for using, assert is macro
template<typename T>
void eraseCircularVector(vector<T> & vec, size_t position, size_t count)
{
assert(count <= vec.size());
if (count > 0)
{
position %= vec.size(); //normalize position
size_t positionEnd = (position + count) % vec.size();
if (positionEnd < position)
{
vec.erase(vec.begin() + position, vec.end());
vec.erase(vec.begin(), vec.begin() + positionEnd);
}
else
vec.erase(vec.begin() + position, vec.begin() + positionEnd);
}
}
int main()
{
vector<int> values;
for (int i = 0; i < 10; ++i)
values.push_back(i);
cout << "Values: ";
for (vector<int>::const_iterator cit = values.begin(); cit != values.end(); cit++)
cout << *cit << ' ';
cout << '\n';
eraseCircularVector(values, 5, 1); //remains 9: 0,1,2,3,4,6,7,8,9
eraseCircularVector(values, 16, 5); //remains 4: 3,4,6,7
cout << "Values: ";
for (vector<int>::const_iterator cit = values.begin(); cit != values.end(); cit++)
cout << *cit << ' ';
cout << '\n';
return 0;
}
However, you might consider:
creating new loop_vector class, if you use this kind of functionality enough
using list if you perform many deletions (or few deletions (not from end, that's simple pop_back) but large array)
If your container (NSMutableArray or whatever) is not list, but vector (i.e. resizable array), you most definitely don't want to delete items one by one, but whole range (e.g. std::vector's erase(begin, end)!
Edit: reacting to comment, to fully realize what must be done by vector, if you erase element other than the last one: it must copy all values after that element (e.g. 1000 items in array, you erase first, 999x copying (moving) of item, that is very costly).
Example:
#include <iostream>
#include <vector>
#include <ctime>
using namespace std;
int main()
{
clock_t start, end;
vector<int> vec;
const int items = 64 * 1024;
cout << "using " << items << " items in vector\n";
for (size_t i = 0; i < items; ++i) vec.push_back(i);
start = clock();
while (!vec.empty()) vec.erase(vec.begin());
end = clock();
cout << "Inefficient method took: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC << " ms\n";
for (size_t i = 0; i < items; ++i) vec.push_back(i);
start = clock();
vec.erase(vec.begin(), vec.end());
end = clock();
cout << "Efficient method took: "
<< (end - start) * 1.0 / CLOCKS_PER_SEC << " ms\n";
return 0;
}
Produces output:
using 65536 items in vector
Inefficient method took: 1.705 ms
Efficient method took: 0 ms
Note it's very easy to get inefficient, look e.g. have at http://www.cplusplus.com/reference/stl/vector/erase/