iText 7 Get all text chunks and their location - itext

I'm trying to get all the individual text chucks and their location using iText7. I need to parse the chucks and let a user select specific texted to be parsed from future identical PDFs. I need to have the user select the employee name, ID, and other text. I will then store the location of that data to use in processing future PDFS. The selection/mapping is a one time setup. Here is an example of the data I have returning in iText 5:
X
Y
chunk Text
1.763889
282.9278
11225 North Sourth St, Bedrock, ND, 99780, Ph: 999 321-6543
15.166975
277.17752
Employee ID
67.3675
277.17752
Employee Name
18.458744
272.22098
001159
68.33058
272.22098
Fred Flintstone
The below code works in iText 5 and I'm trying got migrate it to iText 7 for a project. Some of the objects are no longer exposed in iText 7.
void Main()
{
string filename = #"C:\Employee Info.pdf";
using (var reader = new PdfReader(filename))
{
var parser = new PdfReaderContentParser(reader);
var strategy = parser.ProcessContent(1, new LocationTextExtractionStrategyWithPosition());
// work with the locatins on the chucks to provide list to user.
reader.Close();
}
}
public class LocationTextExtractionStrategyWithPosition : LocationTextExtractionStrategy
{
public readonly List<TextChunk> locationalResult = new List<TextChunk>();
private readonly ITextChunkLocationStrategy tclStrat;
public LocationTextExtractionStrategyWithPosition() : this(new TextChunkLocationStrategyDefaultImp())
{
}
public LocationTextExtractionStrategyWithPosition(ITextChunkLocationStrategy strat)
{
tclStrat = strat;
}
public override void RenderText(TextRenderInfo renderInfo)
{
LineSegment segment = renderInfo.GetBaseline();
if (renderInfo.GetRise() != 0)
{ // remove the rise from the baseline - we do this because the text from a super/subscript render operations should probably be considered as part of the baseline of the text the super/sub is relative to
Matrix riseOffsetTransform = new Matrix(0, -renderInfo.GetRise());
segment = segment.TransformBy(riseOffsetTransform);
}
TextChunk tc = new TextChunk(renderInfo.GetText(), tclStrat.CreateLocation(renderInfo, segment));
//tc.Dump();
locationalResult.Add(tc);
}
}
public class TextLocation
{
public float X { get; set; }
public float Y { get; set; }
public string Text { get; set; }
}

Related

Why mongodb store binary data in the form of base64?

I'm trying to learn how mongodb store each and every data type under the hood.
I have found it stores data in BSON format.
In order to store binary data in mongodb, it requires users to convert byte array in base64 then pass that base64 converted string to BinData(subtype,content in base64) class.
What is the reason behind storing binary data in this format. Why mongodb doesn't allow us to store raw binary?
According to the BSON specification, binary is stored as a 32-bit length followed by a type identifier and then by a series of bytes. Not a base64 string, bytes.
A shamelessly copied function from codementor.io shows that in languages that have the capability to handle binary data directly, it can be directly stored:
public class Question
{
[BsonId]
[BsonRepresentation(BsonType.ObjectId)]
public string Id { get; set; }
public string Category { get; set; }
public string Type { get; set; }
public string QuestionHeading { get; set; }
public byte[] ContentImage { get; set; }
public decimal Score { get; set; }
}
public class QuestionService
{
private readonly IMongoCollection<Question> _questions;
public QuestionService(IDatabaseSettings settings)
{
var client = new MongoClient("<YOUR CONNECTION STRING>");
var database = client.GetDatabase("<YOUR DATABASE NAME>");
_questions = database.GetCollection<Question>("Questions");
}
public Question Get(string id)
{
var result = _questions.Find(
q => q.Id == id).FirstOrDefault();
return result;
}
public Question Create(Question question)
{
_questions.InsertOne(question);
return question;
}
}
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
var service = new QuestionService();
// CONVERT JPG TO A BYTE ARRAY
byte[] binaryContent = File.ReadAllBytes("image.jpg");
var question = new Question
{
Category = "Children's Quizzes",
Type = "Puzzles",
QuestionHeading = "Find the cat in the below image",
ContentImage = binaryContent, // Store the byte array in ContentImage property
Score = 10
};
service.Create(question);
}
BinData() is a constructor that permits specifying binary data directly in the source code text.
geeksforgeeks.org has an example for how to Upload and Retrieve Image on MongoDB using Mongoose

private variable is still accessible from another class in C#

I have a .cs file which looks like the following
namespace TarkovMapper.ClassObjects
{
class PointCloud_Object
{
public void AddPoint(PointEntry_Object point)
{
PointLayer pointLayer = LoadPointLayer(path);
pointLayer.Points[point.Location_x,point.Location_y]++;
}
private PointLayer LoadPointLayer(string path)
{
if (!File.Exists(path)) return new PointLayer(this.Width, this.Height);
Stream s = File.OpenRead(path);
BinaryFormatter b = new BinaryFormatter();
PointLayer returnObject = (PointLayer) b.Deserialize(s);
s.Close();
return returnObject;
}
}
[Serializable]
class PointLayer
{
public PointLayer(int width, int height)
{
this.Points = new int[width, height];
}
public int[,] Points { get; private set; } // <- private set!!!
public int Maximum { get; private set; }
}
}
My Question is regarding the Variable "Points" in the class PointLayer.
Eventhough I have the Modifier private set; the following line in PointCloudObject is no issue pointLayer.Points[point.Location_x,point.Location_y]++;.
why is that?
The modifier refers to the Points array, not the array's individual elements.
The PointCloud_Object class cannot assign a new array to the PointLayer.Points variable, but it can manipulate the individual array elements.

Related data not being added for existing parent entity

Im trying to save a rating against a place, I have the code below, but it doesnt seems to save rating (to the ratings table) for an existing entity
place.Ratings.Add(rating);
_placeRepository.AddPlaceIfItDoesntExist(place);
_placeRepository.Save();
This is the repository method
public void AddPlaceIfItDoesntExist(Place place)
{
var placeItem = context.Places.FirstOrDefault(x => x.GooglePlaceId == place.GooglePlaceId);
if(placeItem==null)
{
context.Places.Add(place);
}
else
{
context.Entry(placeItem).State = EntityState.Modified;
}
}
and this is the poco
public class Place
{
public Place()
{
Ratings = new List<Rating>();
}
public int Id { get; set; }
public string Name { get; set; }
public string GooglePlaceId { get; set; }
}
I think the crux of the problem is because i need to check if the place exists based on googleplaceid(a string) rather than the id (both are unique per place btw)
Here
context.Entry(placeItem).State = EntityState.Modified;
you just mark the existing placeItem object as modified. But it's a different instance than the passed place object, hence contains the orginal values.
Instead, replace that line with:
context.Entry(placeItem).CurrentValues.SetValues(place);
Alternatively, you can use the DbSetMigrationsExtensions.AddOrUpdate method overload that allows you to pass a custom identification expression:
using System.Data.Entity.Migrations;
public void AddPlaceIfItDoesntExist(Place place)
{
context.Places.AddOrUpdate(p => p.GooglePlaceId, place);
}

Compute totals of invoice server side when invoice/invoiceLine changed

I use Breeze with Durandal (still 1.2) and I am facing a problem which I haven't found an easy solution for. I have 2 entities: Invoice & InvoiceLine like described below:
public class Invoice
{
[Key]
public int Id { get; set; }
public string InvoiceNumber { get; set; }
public string Comment { get; set; }
public double? TotalExclVAT { get; set; }
public double? TotalInclVAT { get; set; }
public double? TotalVAT { get; set; }
public bool? WithoutVAT { get; set; }
public virtual List<InvoiceLine> Lines { get; set; }
}
public class InvoiceLine
{
[Key]
public int Id { get; set; }
public string Description { get; set; }
public double VatPercent { get; set; }
public double Amount { get; set; }
public int InvoiceId { get; set; }
public virtual Invoice Invoice { get; set; }
}
I need to compute the totals of the invoice (TotalExclVAT, TotalInclVAT, TotalVAT) in 2 cases:
Whenever someone adds/modifies an invoice line.
Whenever someone changes the flag WithoutVAT on the invoice.
I don't think this is a good idea to perform this compute client side. Performing this server side is better for security reasons mainly.
My first thought was to do the job in the BeforeSaveEntity of Invoice & InvoiceLine.
Here is what i did:
public bool BeforeSaveEntity(EntityState entityState, EntityInfo entityInfo)
{
var invoice = entityInfo.Entity as Invoice;
...
ComputeTotal(entityInfo, invoice);
}
private void ComputeTotal(EntityInfo entityInfo, Invoice invoice)
{
var query = Context.InvoiceLines.Where(x => x.invoiceId == invoice.Id).AsEnumerable();
double totalExclVAT = 0;
double totalVAT = 0;
int percent = 0;
foreach (var line in query.ToList())
{
totalExclVAT = ...
totalVAT = ...
}
entityInfo.OriginalValuesMap.Add("TotalExclVAT", invoice.TotalExclVAT);
entityInfo.OriginalValuesMap.Add("TotalInclVAT", invoice.TotalInclVAT);
entityInfo.OriginalValuesMap.Add("TotalVAT", invoice.TotalVAT);
accounting.TotalExclVAT = totalExclVAT;
accounting.TotalInclVAT = totalExclVAT + totalVAT;
accounting.TotalVAT = totalVAT;
}
The same kind of thing is done for the invoice line. As you can see in the ComputeTotal function, I perform a query to get invoice lines from DB then computing totals and saving results in the invoice.
It doesn't work quite well: in case of adding a new line on my invoice, performing a query on my DB doesn't get this added line! Because it is not already stored in DB.
It would have been easier to proceed client side but I don't think this is a good idea... is it?
So I am sure there is another way of doing but I don't find it myself.
Any help is greathly appreciated.
UPDATE
Below is my first shot with this problem.
public Dictionary<Type, List<EntityInfo>> BeforeSaveEntities(Dictionary<Type, List<EntityInfo>> saveMap)
{
List<EntityInfo> invoices;
List<EntityInfo> invoiceLines;
EntityInfo ei;
if (!saveMap.TryGetValue(typeof(InvoiceLine), out invoiceLines))
{
// if we fall here it means no invoice lines exists in the saveMap
}
if (!saveMap.TryGetValue(typeof(Invoice), out invoices))
{
// if we fall here it means no invoices exists in the saveMap
// >> getting the invoice from DB and add it to the map
using (var dc = new BreezeContext())
{
int invoiceId = ((InvoiceLine)invoiceLines[0].Entity).InvoiceId;
EFContextProvider<BreezeContext> cp = new EFContextProvider<BreezeContext>();
var acc = dc.Invoices.Where(x => x.Id == invoiceId).FirstOrDefault();
ei = cp.CreateEntityInfo(acc, Breeze.WebApi.EntityState.Modified);
invoices = new List<EntityInfo>();
saveMap.Add(typeof(Invoice), invoices);
invoices.Add(ei);
}
}
// There is only 1 invoice at a time in the saveMap
Invoice invoice = (Invoice)invoices[0].Entity;
ei = invoices[0];
Dictionary<int, InvoiceLine> hashset = new Dictionary<int, InvoiceLine>();
// Retrieving values of invoice lines from database (server side)
using (var dc = new BreezeContext())
{
var linesServerSide = dc.InvoiceLines.Where(x => x.InvoiceId == invoice.Id).AsEnumerable();
foreach (var elm in linesServerSide)
{
hashset.Add(elm.Id, elm);
}
}
// Retrieving values of invoice lines from modified lines (client side)
foreach (var entityInfo in invoiceLines)
{
InvoiceLine entity = (InvoiceLine)entityInfo.Entity;
switch (entityInfo.EntityState)
{
case Breeze.WebApi.EntityState.Added:
hashset.Add(entity.Id, entity);
break;
case Breeze.WebApi.EntityState.Deleted:
hashset.Remove(entity.Id);
break;
case Breeze.WebApi.EntityState.Modified:
hashset.Remove(entity.Id);
hashset.Add(entity.Id, entity);
break;
}
}
// Computing totals based on my hashset
double totalExclVAT = 0;
double totalInclVAT = 0;
double totalVAT = 0;
foreach (var elm in hashset)
{
InvoiceLine line = elm.Value;
totalExclVAT += line.Amount;
totalVAT += line.Amount * (int)line.VatPercent.Value / 100;
}
totalInclVAT = totalExclVAT + totalVAT;
// Adding keys if necessary
if (!ei.OriginalValuesMap.ContainsKey("TotalExclVAT"))
ei.OriginalValuesMap.Add("TotalExclVAT", invoice.TotalExclVAT);
if (!ei.OriginalValuesMap.ContainsKey("TotalInclVAT"))
ei.OriginalValuesMap.Add("TotalInclVAT", invoice.TotalInclVAT);
if (!ei.OriginalValuesMap.ContainsKey("TotalVAT"))
ei.OriginalValuesMap.Add("TotalVAT", invoice.TotalVAT);
// Modifying total values
invoice.TotalExclVAT = totalExclVAT;
invoice.TotalInclVAT = totalInclVAT;
invoice.TotalVAT = totalVAT;
return saveMap;
}
The solution above works well whenever the invoice & the invoiceLines are modified client side. I have a problem when no invoice is modified client side (only lines modified). In this case I need to add the related invoice to the saveMap by getting it from DB. That's what I do in my code as you can see. But I need to add keys to the OriginalValuesMap for properties I manually modified here and I cannot in this case because my dictionary object is null. Then when I do...
ei.OriginalValuesMap.Add("TotalExclVAT", invoice.TotalExclVAT);
... on a null object (OriginalValuesMap) it doesn't work.
So my new problem is now the next: how to add an entity to the saveMap which already exists on DB. So I don't want to mark this entity as ei = cp.CreateEntityInfo(acc, Breeze.WebApi.EntityState.Add); but rather ei = cp.CreateEntityInfo(acc, Breeze.WebApi.EntityState.Modified);. In this case my OriginalValuesMap is null and it seems to be a problem.
Hope you understand what I try to explain here.
Is there any reason not to use a triggered stored procedure for this? This would certainly be the simplest approach...
But... if there is, then the other approach would be to use 'BeforeSaveEntities' instead of 'BeforeSaveEntity' because this will give you access to the entire 'SaveMap'.
Then create a hashset of all of the invoiceLines for each modified invoice and construct this as the combination of your server side query of invoice lines per invoice overlayed with client side invoiceLines associated with this invoice (from the SaveMap). Next just total each hashSet and use this update your 'Totalxxx' properties.
A little terse but hopefully this makes sense.

How to decorate a class item to be an index and get the same as using ensureIndex?

I'd like to define in class declaration which items are index, something like:
public class MyClass {
public int SomeNum { get; set; }
[THISISANINDEX]
public string SomeProperty { get; set; }
}
so to have the same effect as ensureIndex("SomeProperty")
Is this possible?
I think this is a nice idea, but you have to do this yourself, there's no built-in support for it. If you have an access layer you can do it in there. You'd need an attribute class, something like this;
public enum IndexConstraints
{
Normal = 0x00000001, // Ascending, non-indexed
Descending = 0x00000010,
Unique = 0x00000100,
Sparse = 0x00001000, // allows nulls in the indexed fields
}
// Applied to a member
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field)]
public class EnsureIndexAttribute : EnsureIndexes
{
public EnsureIndex(IndexConstraints ic = IndexConstraints.Normal) : base(ic) { }
}
// Applied to a class
[AttributeUsage(AttributeTargets.Class)]
public class EnsureIndexesAttribute : Attribute
{
public bool Descending { get; private set; }
public bool Unique { get; private set; }
public bool Sparse { get; private set; }
public string[] Keys { get; private set; }
public EnsureIndexes(params string[] keys) : this(IndexConstraints.Normal, keys) {}
public EnsureIndexes(IndexConstraints ic, params string[] keys)
{
this.Descending = ((ic & IndexConstraints.Descending) != 0);
this.Unique = ((ic & IndexConstraints.Unique) != 0); ;
this.Sparse = ((ic & IndexConstraints.Sparse) != 0); ;
this.Keys = keys;
}
}//class EnsureIndexes
You could then apply attributes at either the class or member level as follows. I found that adding at member level was less likely to get out of sync with the schema compared to adding at the class level. You need to make sure of course that you get the actual element name as opposed to the C# member name;
[CollectionName("People")]
//[EnsureIndexes("k")]// doing it here would allow for multi-key configs
public class Person
{
[BsonElement("k")] // name mapping in the DB schema
[BsonIgnoreIfNull]
[EnsureIndex(IndexConstraints.Unique|IndexConstraints.Sparse)] // name is implicit here
public string userId{ get; protected set; }
// other properties go here
}
and then in your DB access implementation (or repository), you need something like this;
private void AssureIndexesNotInlinable()
{
// We can only index a collection if there's at least one element, otherwise it does nothing
if (this.collection.Count() > 0)
{
// Check for EnsureIndex Attribute
var theClass = typeof(T);
// Walk the members of the class to see if there are any directly attached index directives
foreach (var m in theClass.GetProperties(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.FlattenHierarchy))
{
List<string> elementNameOverride = new List<string>(1);
EnsureIndexes indexAttr = null;
// For each members attribs
foreach (Attribute attr in m.GetCustomAttributes())
{
if (attr.GetType() == typeof(EnsureIndex))
indexAttr = (EnsureIndex)attr;
if (attr.GetType() == typeof(RepoElementAttribute))
elementNameOverride.Add(((RepoElementAttribute)attr).ElementName);
if ((indexAttr != null) && (elementNameOverride.Count != 0))
break;
}
// Index
if (indexAttr != null)
{
if (elementNameOverride.Count() > 0)
EnsureIndexesAsDeclared(indexAttr, elementNameOverride);
else
EnsureIndexesAsDeclared(indexAttr);
}
}
// Walk the atributes on the class itself. WARNING: We don't validate the member names here, we just create the indexes
// so if you create a unique index and don't have a field to match you'll get an exception as you try to add the second
// item with a null value on that key
foreach (Attribute attr in theClass.GetCustomAttributes(true))
{
if (attr.GetType() == typeof(EnsureIndexes))
EnsureIndexesAsDeclared((EnsureIndexes)attr);
}//foreach
}//if this.collection.count
}//AssureIndexesNotInlinable()
EnsureIndexes then looks like this;
private void EnsureIndexesAsDeclared(EnsureIndexes attr, List<string> indexFields = null)
{
var eia = attr as EnsureIndexes;
if (indexFields == null)
indexFields = eia.Keys.ToList();
// use driver specific methods to actually create this index on the collection
var db = GetRepositoryManager(); // if you have a repository or some other method of your own
db.EnsureIndexes(indexFields, attr.Descending, attr.Unique, attr.Sparse);
}//EnsureIndexes()
Note that you'll place this after each and every update because if you forget somewhere your indexes may not get created. It's important to ensure therefore that you optimise the call so that it returns quickly if there's no indexing to do before going through all that reflection code. Ideally, you'd do this just once, or at the very least, once per application startup. So one way would be to use a static flag to track whether you've already done so, and you'd need additional lock protection around that, but over-simplistically, it looks something like this;
void AssureIndexes()
{
if (_requiresIndexing)
AssureIndexesInit();
}
So that's the method you'll want in each and every DB update you make, which, if you're lucky would get inlined by the JIT optimizer as well.
See below for a naive implementation which could do with some brains to take the indexing advice from the MongoDb documentation into consideration. Creating indexes based on queries used within the application instead of adding custom attributes to properties might be another option.
using System;
using System.Reflection;
using MongoDB.Bson.Serialization.Attributes;
using MongoDB.Driver;
using NUnit.Framework;
using SharpTestsEx;
namespace Mongeek
{
[TestFixture]
class TestDecorateToEnsureIndex
{
[Test]
public void ShouldIndexPropertyWithEnsureIndexAttribute()
{
var server = MongoServer.Create("mongodb://localhost");
var db = server.GetDatabase("IndexTest");
var boatCollection = db.GetCollection<Boat>("Boats");
boatCollection.DropAllIndexes();
var indexer = new Indexer();
indexer.EnsureThat(boatCollection).HasIndexesNeededBy<Boat>();
boatCollection.IndexExists(new[] { "Name" }).Should().Be.True();
}
}
internal class Indexer
{
private MongoCollection _mongoCollection;
public Indexer EnsureThat(MongoCollection mongoCollection)
{
_mongoCollection = mongoCollection;
return this;
}
public Indexer HasIndexesNeededBy<T>()
{
Type t = typeof (T);
foreach(PropertyInfo prop in t.GetProperties() )
{
if (Attribute.IsDefined(prop, typeof (EnsureIndexAttribute)))
{
_mongoCollection.EnsureIndex(new[] {prop.Name});
}
}
return this;
}
}
internal class Boat
{
public Boat(Guid id)
{
Id = id;
}
[BsonId]
public Guid Id { get; private set; }
public int Length { get; set; }
[EnsureIndex]
public string Name { get; set; }
}
internal class EnsureIndexAttribute : Attribute
{
}
}