We rarely get StackOverflowExceptions when compiling LINQ to Entity queries. These exceptions happen completely random and we found no way to reproduce it. Even running same code again after getting the exception, it usually works fine. So far we thought this might be related to the complexity of the queries, but now we got it on a really simple one:
var prozQuery = ctx.Set<Prozess>().Where(x => x.Id = prozess.Id);
var titelQuery = prozQuery
.SelectMany(x => x.Personen)
.Where(x => x.BssPersonenrolleEnu == 1 || x.BssPersonenrolleEnu == 5)
.Select(x => x.Person)
.SelectMany(pt => pt.BssTitel);
var titelOrgaQuery = titelQuery.Select(x => x.AusstellendeOrganisation).Where(x => x != null);
var titelOrganisationen = titelOrgaQuery.ToList() // Overflow happens here
Last frames on Stacktrace:
000000f13aa93470 00007ffe0c5ffa74 [FaultingExceptionFrame: 000000f13aa93470]
000000f13aa97748 00007ffe0c5ffa74 [HelperMethodFrame: 000000f13aa97748]
000000f13aa97870 00007ffdec2576a1 System.Text.StringBuilder.ExpandByABlock(Int32)
000000f13aa978d0 00007ffdec2575a9 System.Text.StringBuilder.Append(Char*, Int32)
000000f13aa97920 00007ffdec257539 System.Text.StringBuilder.AppendHelper(System.String)
000000f13aa97950 00007ffdec25f30e System.Text.StringBuilder.Append(System.String)
000000f13aa979a0 00007ffd97b9a7c4 System.Data.Entity.Core.Metadata.Edm.TypeUsage.BuildIdentity(System.Text.StringBuilder)
000000f13aa97a00 00007ffd9969bdfc System.Data.Entity.Core.Metadata.Edm.RowType.GetRowTypeIdentityFromProperties(System.Collections.Generic.IEnumerable`1, System.Data.Entity.Core.Objects.ELinq.InitializerMetadata)
000000f13aa97a70 00007ffd9969b9fe System.Data.Entity.Core.Metadata.Edm.RowType..ctor(System.Collections.Generic.IEnumerable`1, System.Data.Entity.Core.Objects.ELinq.InitializerMetadata)
000000f13aa97ac0 00007ffd9969b957 System.Data.Entity.Core.Common.TypeHelpers.CreateRowType(System.Collections.Generic.IEnumerable`1>, System.Data.Entity.Core.Objects.ELinq.InitializerMetadata)
000000f13aa97b30 00007ffd9976574f System.Data.Entity.Core.Common.TypeHelpers.CreateKeyRowType(System.Data.Entity.Core.Metadata.Edm.EntityTypeBase)
000000f13aa97ba0 00007ffd99765520 System.Data.Entity.Core.Query.PlanCompiler.ITreeGenerator.Visit(System.Data.Entity.Core.Common.CommandTrees.DbRefExpression)
000000f13aa97bf0 00007ffd996a939a System.Data.Entity.Core.Query.PlanCompiler.ITreeGenerator.Visit(System.Data.Entity.Core.Common.CommandTrees.DbNewInstanceExpression)
000000f13aa97c90 00007ffd99663f25 System.Data.Entity.Core.Query.PlanCompiler.ITreeGenerator.VisitExprAsScalar(System.Data.Entity.Core.Common.CommandTrees.DbExpression)
000000f13aa97cd0 00007ffd996609cf System.Data.Entity.Core.Query.PlanCompiler.ITreeGenerator.GenerateStandardProject(System.Data.Entity.Core.Common.CommandTrees.DbProjectExpression)
000000f13aa97d20 00007ffd9965ed67 System.Data.Entity.Core.Query.PlanCompiler.ITreeGenerator..ctor(System.Data.Entity.Core.Common.CommandTrees.DbQueryCommandTree, System.Data.Entity.Core.Mapping.ViewGeneration.DiscriminatorMap)
000000f13aa97da0 00007ffd996a8c5a System.Data.Entity.Core.Mapping.ViewGeneration.GeneratedView.GetInternalTree(System.Data.Entity.Core.Query.InternalTrees.Command)
000000f13aa97df0 00007ffd99669a88 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.ExpandView(System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.IsOfOp ByRef)
000000f13aa97e60 00007ffd996696a6 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.ProcessScanTable(System.Data.Entity.Core.Query.InternalTrees.Node, System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.IsOfOp ByRef)
000000f13aa97eb0 00007ffd99669606 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa97ee0 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa97f40 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa97fc0 00007ffd99669134 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.FilterOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98020 00007ffd9e4c4be9 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.BuildRelPropertyExpression(System.Data.Entity.Core.Metadata.Edm.EntitySetBase, System.Data.Entity.Core.Query.InternalTrees.RelProperty, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa980d0 00007ffd996acc54 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor+d__54.MoveNext()
000000f13aa98140 00007ffd996ab8a3 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.NewEntityOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa981f0 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98250 00007ffd99666f95 System.Data.Entity.Core.Query.InternalTrees.BasicOpVisitorOfNode.VisitDefault(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98280 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa982e0 00007ffd99666f95 System.Data.Entity.Core.Query.InternalTrees.BasicOpVisitorOfNode.VisitDefault(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98310 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98370 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa983f0 00007ffd99667847 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ProjectOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98470 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa984d0 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98550 00007ffd996ab40a System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanViewOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa985a0 00007ffd996696b1 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.ProcessScanTable(System.Data.Entity.Core.Query.InternalTrees.Node, System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.IsOfOp ByRef)
000000f13aa985f0 00007ffd99669606 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98620 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98680 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98700 00007ffd99669134 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.FilterOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98760 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa987c0 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98840 00007ffd99667847 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ProjectOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa988c0 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98920 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa989a0 00007ffd996ab40a System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanViewOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa989f0 00007ffd996696b1 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.ProcessScanTable(System.Data.Entity.Core.Query.InternalTrees.Node, System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.IsOfOp ByRef)
000000f13aa98a40 00007ffd99669606 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98a70 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98ad0 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98b50 00007ffd99669134 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.FilterOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98bb0 00007ffd9e4c4be9 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.BuildRelPropertyExpression(System.Data.Entity.Core.Metadata.Edm.EntitySetBase, System.Data.Entity.Core.Query.InternalTrees.RelProperty, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98c60 00007ffd996acc54 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor+d__54.MoveNext()
000000f13aa98cd0 00007ffd996ab8a3 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.NewEntityOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98d80 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98de0 00007ffd99666f95 System.Data.Entity.Core.Query.InternalTrees.BasicOpVisitorOfNode.VisitDefault(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98e10 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98e70 00007ffd99666f95 System.Data.Entity.Core.Query.InternalTrees.BasicOpVisitorOfNode.VisitDefault(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98ea0 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98f00 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa98f80 00007ffd99667847 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ProjectOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99000 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99060 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa990e0 00007ffd996ab40a System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanViewOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99130 00007ffd996696b1 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.ProcessScanTable(System.Data.Entity.Core.Query.InternalTrees.Node, System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.IsOfOp ByRef)
000000f13aa99180 00007ffd99669606 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa991b0 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99210 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99290 00007ffd99669134 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.FilterOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa992f0 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99350 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa993d0 00007ffd99667847 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ProjectOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99450 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa994b0 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99530 00007ffd996ab40a System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanViewOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99580 00007ffd996696b1 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.ProcessScanTable(System.Data.Entity.Core.Query.InternalTrees.Node, System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.IsOfOp ByRef)
000000f13aa995d0 00007ffd99669606 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99600 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99660 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa996e0 00007ffd99669134 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.FilterOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99740 00007ffd9e4c4be9 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.BuildRelPropertyExpression(System.Data.Entity.Core.Metadata.Edm.EntitySetBase, System.Data.Entity.Core.Query.InternalTrees.RelProperty, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa997f0 00007ffd996acc54 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor+d__54.MoveNext()
000000f13aa99860 00007ffd996ab8a3 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.NewEntityOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99910 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99970 00007ffd99666f95 System.Data.Entity.Core.Query.InternalTrees.BasicOpVisitorOfNode.VisitDefault(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa999a0 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99a00 00007ffd99666f95 System.Data.Entity.Core.Query.InternalTrees.BasicOpVisitorOfNode.VisitDefault(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99a30 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99a90 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99b10 00007ffd99667847 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ProjectOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99b90 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99bf0 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99c70 00007ffd996ab40a System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanViewOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99cc0 00007ffd996696b1 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.ProcessScanTable(System.Data.Entity.Core.Query.InternalTrees.Node, System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.IsOfOp ByRef)
000000f13aa99d10 00007ffd99669606 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99d40 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99da0 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99e20 00007ffd99669134 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.FilterOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99e80 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99ee0 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99f60 00007ffd99667847 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ProjectOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa99fe0 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa9a040 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa9a0c0 00007ffd996ab40a System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanViewOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa9a110 00007ffd996696b1 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.ProcessScanTable(System.Data.Entity.Core.Query.InternalTrees.Node, System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.IsOfOp ByRef)
000000f13aa9a160 00007ffd99669606 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.ScanTableOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa9a190 00007ffd99667016 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitChildren(System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa9a1f0 00007ffd996678a5 System.Data.Entity.Core.Query.PlanCompiler.SubqueryTrackingVisitor.VisitRelOpDefault(System.Data.Entity.Core.Query.InternalTrees.RelOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa9a270 00007ffd99669134 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.Visit(System.Data.Entity.Core.Query.InternalTrees.FilterOp, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa9a2d0 00007ffd9e4c4be9 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor.BuildRelPropertyExpression(System.Data.Entity.Core.Metadata.Edm.EntitySetBase, System.Data.Entity.Core.Query.InternalTrees.RelProperty, System.Data.Entity.Core.Query.InternalTrees.Node)
000000f13aa9a380 00007ffd996acc54 System.Data.Entity.Core.Query.PlanCompiler.PreProcessor+d__54.MoveNext()
Related
I know how to remove outliers with iqr using pandas. But now i'm trying to learn pyspark. I have done some searching online, but most of them only flag outliers with 'yes' and 'no', and does not proceed to remove them. Additionally, I also dont understand the code they are writing.
I have also tried it out a bit by myself but to no avail. Ultimately, I dont know where to start.
Here's an example dataframe
+----------+---------+--------------+---------------+----------------------+------------+
| town|flat_type| flat_model|remaining_lease|floor_area_sqm_imputed|resale_price|
+----------+---------+--------------+---------------+----------------------+------------+
|ANG MO KIO| 2 ROOM| Improved| 736| 44.0| 232000.0|
|ANG MO KIO| 3 ROOM|New Generation| 727| 67.0| 250000.0|
|ANG MO KIO| 3 ROOM|New Generation| 749| 67.0| 262000.0|
|ANG MO KIO| 3 ROOM|New Generation| 745| 68.0| 265000.0|
|ANG MO KIO| 3 ROOM|New Generation| 749| 67.0| 265000.0|
|ANG MO KIO| 3 ROOM|New Generation| 756| 68.0| 275000.0|
|ANG MO KIO| 3 ROOM|New Generation| 738| 68.0| 280000.0|
|ANG MO KIO| 3 ROOM|New Generation| 700| 67.0| 285000.0|
|ANG MO KIO| 3 ROOM|New Generation| 738| 68.0| 285000.0|
|ANG MO KIO| 3 ROOM|New Generation| 736| 67.0| 285000.0|
+----------+---------+--------------+---------------+----------------------+------------+
I plan on doing removals for floor_area_sqm_imputed so i dont need code that assumes there are multiple columns.
Any help appreciated. I know it sounds like I just want answers instead of searching for myself.
Use percentile_approx Spark SQL function to compute quantile 1 and quantile 3 and filter records in this range.
import pyspark.sql.functions as F
df = spark.createDataFrame(data=[["ANG MO KIO","2 ROOM","Improved",736,44.0,232000.0],["ANG MO KIO","3 ROOM","New Generation",727,67.0,250000.0],["ANG MO KIO","3 ROOM","New Generation",749,67.0,262000.0],["ANG MO KIO","3 ROOM","New Generation",745,68.0,265000.0],["ANG MO KIO","3 ROOM","New Generation",749,67.0,265000.0],["ANG MO KIO","3 ROOM","New Generation",756,68.0,275000.0],["ANG MO KIO","3 ROOM","New Generation",738,68.0,280000.0],["ANG MO KIO","3 ROOM","New Generation",700,67.0,285000.0],["ANG MO KIO","3 ROOM","New Generation",738,68.0,285000.0],["ANG MO KIO","3 ROOM","New Generation",736,67.0,285000.0]], schema=["town","flat_type","flat_model","remaining_lease","floor_area_sqm_imputed","resale_price"])
qtr_map = df.select( \
F.expr("percentile_approx(floor_area_sqm_imputed, 0.25) as Q1"), \
F.expr("percentile_approx(floor_area_sqm_imputed, 0.75) as Q3") \
) \
.collect()[0] \
.asDict()
df = df.filter( \
(F.col("floor_area_sqm_imputed") >= qtr_map["Q1"]) \
& (F.col("floor_area_sqm_imputed") <= qtr_map["Q3"]) \
)
Output:
+----------+---------+--------------+---------------+----------------------+------------+
| town|flat_type| flat_model|remaining_lease|floor_area_sqm_imputed|resale_price|
+----------+---------+--------------+---------------+----------------------+------------+
|ANG MO KIO| 3 ROOM|New Generation| 727| 67.0| 250000.0|
|ANG MO KIO| 3 ROOM|New Generation| 749| 67.0| 262000.0|
|ANG MO KIO| 3 ROOM|New Generation| 745| 68.0| 265000.0|
|ANG MO KIO| 3 ROOM|New Generation| 749| 67.0| 265000.0|
|ANG MO KIO| 3 ROOM|New Generation| 756| 68.0| 275000.0|
|ANG MO KIO| 3 ROOM|New Generation| 738| 68.0| 280000.0|
|ANG MO KIO| 3 ROOM|New Generation| 700| 67.0| 285000.0|
|ANG MO KIO| 3 ROOM|New Generation| 738| 68.0| 285000.0|
|ANG MO KIO| 3 ROOM|New Generation| 736| 67.0| 285000.0|
+----------+---------+--------------+---------------+----------------------+------------+
We are using Jbpm 4.4 as our 3rd party Business Process Management tool with Java 6.x. However So far we used it with Oracle DB and it worked well, but now we want to run it with PostgreSQL 12.x version DB.
So we integrated postgresql-42.2.19.jre6.jar (JDBC driver) and try to run it.
We have encountered below error in the run time.
Can anyone suggest what need to be done here to resolve the issue, specially with JBPM 4.4
We have already set
<prop key="hibernate.connection.autocommit">false</prop>
But that did not resolved our issue.
2021-05-05 06:41:57,670 ERROR [o-8443-exec-154] .AbstractFlushingEventListener portaladmin#10.100.250.41 - Could not synchronize database state with session
org.hibernate.exception.GenericJDBCException: could not insert: [org.jbpm.pvm.internal.lob.Lob]
at org.hibernate.exception.SQLStateConverter.handledNonSpecificException(SQLStateConverter.java:126) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:114) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2295) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2688) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.action.EntityInsertAction.execute(EntityInsertAction.java:79) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:279) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:263) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:167) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:321) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.event.def.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:64) [hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.impl.SessionImpl.autoFlushIfRequired(SessionImpl.java:996) [hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1141) [hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.impl.QueryImpl.list(QueryImpl.java:102) [hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.jbpm.pvm.internal.query.AbstractQuery.execute(AbstractQuery.java:93) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.query.ProcessDefinitionQueryImpl.execute(ProcessDefinitionQueryImpl.java:67) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.query.AbstractQuery.untypedList(AbstractQuery.java:67) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.query.ProcessDefinitionQueryImpl.list(ProcessDefinitionQueryImpl.java:157) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.repository.ProcessDeployer.checkKey(ProcessDeployer.java:133) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.repository.ProcessDeployer.deploy(ProcessDeployer.java:92) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.repository.DeployerManager.deploy(DeployerManager.java:46) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.repository.RepositorySessionImpl.deploy(RepositorySessionImpl.java:62) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.cmd.DeployCmd.execute(DeployCmd.java:47) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.cmd.DeployCmd.execute(DeployCmd.java:33) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.svc.DefaultCommandService.execute(DefaultCommandService.java:42) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.tx.SpringCommandCallback.doInTransaction(SpringCommandCallback.java:45) [jbpm-pvm-4.4.jar:4.4]
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) [spring-tx-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.jbpm.pvm.internal.tx.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:49) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.svc.EnvironmentInterceptor.executeInNewEnvironment(EnvironmentInterceptor.java:53) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.svc.EnvironmentInterceptor.execute(EnvironmentInterceptor.java:40) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.svc.RetryInterceptor.execute(RetryInterceptor.java:56) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.svc.SkipInterceptor.execute(SkipInterceptor.java:43) [jbpm-pvm-4.4.jar:4.4]
at org.jbpm.pvm.internal.repository.DeploymentImpl.deploy(DeploymentImpl.java:90) [jbpm-pvm-4.4.jar:4.4]
at com.abc.def.portal.processes.jbpm.JbpmProcessDefinitionRepository.deployProcess_aroundBody18(JbpmProcessDefinitionRepository.java:108) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessDefinitionRepository.deployProcess_aroundBody19$advice(JbpmProcessDefinitionRepository.java:92) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessDefinitionRepository.deployProcess_aroundBody20(JbpmProcessDefinitionRepository.java:1) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessDefinitionRepository.deployProcess_aroundBody22(JbpmProcessDefinitionRepository.java:106) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessDefinitionRepository.deployProcess_aroundBody23$advice(JbpmProcessDefinitionRepository.java:80) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessDefinitionRepository.deployProcess(JbpmProcessDefinitionRepository.java:1) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessService.deployProcess_aroundBody46(JbpmProcessService.java:178) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessService.deployProcess_aroundBody47$advice(JbpmProcessService.java:92) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessService.deployProcess_aroundBody48(JbpmProcessService.java:1) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessService.deployProcess_aroundBody50(JbpmProcessService.java:178) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessService.deployProcess_aroundBody51$advice(JbpmProcessService.java:80) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessService.deployProcess_aroundBody52(JbpmProcessService.java:1) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessService.deployProcess_aroundBody53$advice(JbpmProcessService.java:61) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.processes.jbpm.JbpmProcessService.deployProcess(JbpmProcessService.java:1) [com.abc.def.portal.processes-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody128(TaskController.java:611) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody129$advice(TaskController.java:58) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody130(TaskController.java:1) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody131$advice(TaskController.java:92) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody132(TaskController.java:1) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody134(TaskController.java:605) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody135$advice(TaskController.java:102) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody136(TaskController.java:1) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload_aroundBody137$advice(TaskController.java:55) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController.handleFormUpload(TaskController.java:1) [TaskController.class:na]
at com.abc.def.portal.partner.client.task.TaskController$$FastClassByCGLIB$$2349406.invoke(<generated>) [cglib-nodep-2.1_3.jar:na]
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149) [cglib-nodep-2.1_3.jar:na]
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:67) [spring-security-core-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622) [spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at com.abc.def.portal.partner.client.task.TaskController$$EnhancerByCGLIB$$4f295537.handleFormUpload(<generated>) [cglib-nodep-2.1_3.jar:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.6.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) ~[na:1.6.0_45]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) ~[na:1.6.0_45]
at java.lang.reflect.Method.invoke(Method.java:597) ~[na:1.6.0_45]
at org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:213) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:126) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:96) [spring-webmvc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:617) [spring-webmvc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:578) [spring-webmvc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:80) [spring-webmvc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923) [spring-webmvc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852) [spring-webmvc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882) [spring-webmvc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789) [spring-webmvc-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:646) [servlet-api.jar:na]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) [servlet-api.jar:na]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:369) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at com.abc.def.portal.partner.client.security.IncompleteUserProfileFilter.doFilterInternal_aroundBody4(IncompleteUserProfileFilter.java:108) [IncompleteUserProfileFilter.class:na]
at com.abc.def.portal.partner.client.security.IncompleteUserProfileFilter.doFilterInternal(IncompleteUserProfileFilter.java:89) [IncompleteUserProfileFilter.class:na]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:109) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:97) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:100) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:78) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:35) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:187) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at com.abc.def.portal.ui.servlet.SsoRequestHeaderAuthenticationFilter.doFilter_aroundBody2(SsoRequestHeaderAuthenticationFilter.java:63) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.ui.servlet.SsoRequestHeaderAuthenticationFilter.doFilter(SsoRequestHeaderAuthenticationFilter.java:58) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:79) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.session.ConcurrentSessionFilter.doFilter(ConcurrentSessionFilter.java:109) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:381) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:168) [spring-security-web-3.0.7.RELEASE.jar:3.0.7.RELEASE]
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:346) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:259) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at com.abc.def.portal.partner.client.security.SSOAutoLoginFilter.doFilterInternal_aroundBody0(SSOAutoLoginFilter.java:67) [SSOAutoLoginFilter.class:na]
at com.abc.def.portal.partner.client.security.SSOAutoLoginFilter.doFilterInternal(SSOAutoLoginFilter.java:63) [SSOAutoLoginFilter.class:na]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at com.abc.def.portal.ui.csrf.CsrfFilter.doFilterInternal_aroundBody0(CsrfFilter.java:86) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.ui.csrf.CsrfFilter.doFilterInternal(CsrfFilter.java:57) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at com.abc.def.portal.ui.csrf.AjaxTimeoutFilter.doFilterInternal_aroundBody0(AjaxTimeoutFilter.java:45) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.ui.csrf.AjaxTimeoutFilter.doFilterInternal(AjaxTimeoutFilter.java:31) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at com.abc.def.portal.ui.timing.TimingServletFilter.doFilter_aroundBody2(TimingServletFilter.java:71) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.ui.timing.TimingServletFilter.doFilter(TimingServletFilter.java:63) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at com.abc.def.portal.ui.servlet.XFilter.doFilterInternal_aroundBody0(XFilter.java:56) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.ui.servlet.XFilter.doFilterInternal_aroundBody1$advice(XFilter.java:64) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at com.abc.def.portal.ui.servlet.XFilter.doFilterInternal(XFilter.java:51) [com.abc.def.portal.ui-2.1.NOPSE19C.1.jar:na]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) [spring-web-3.1.2.RELEASE.jar:3.1.2.RELEASE]
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) [catalina.jar:7.0.53]
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) [catalina.jar:7.0.53]
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:610) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) [catalina.jar:7.0.53]
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) [catalina.jar:7.0.53]
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) [catalina.jar:7.0.53]
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) [catalina.jar:7.0.53]
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) [catalina.jar:7.0.53]
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040) [tomcat-coyote.jar:7.0.53]
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607) [tomcat-coyote.jar:7.0.53]
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313) [tomcat-coyote.jar:7.0.53]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [na:1.6.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [na:1.6.0_45]
at java.lang.Thread.run(Thread.java:662) [na:1.6.0_45]
Caused by: org.postgresql.util.PSQLException: Large Objects may not be used in auto-commit mode.
at org.postgresql.largeobject.LargeObjectManager.createLO(LargeObjectManager.java:284) ~[postgresql-42.2.19.jre6.jar:42.2.19.jre6]
at org.postgresql.largeobject.LargeObjectManager.createLO(LargeObjectManager.java:272) ~[postgresql-42.2.19.jre6.jar:42.2.19.jre6]
at org.postgresql.jdbc.PgPreparedStatement.createBlob(PgPreparedStatement.java:1159) ~[postgresql-42.2.19.jre6.jar:42.2.19.jre6]
at org.postgresql.jdbc.PgPreparedStatement.setBlob(PgPreparedStatement.java:1200) ~[postgresql-42.2.19.jre6.jar:42.2.19.jre6]
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.setBlob(NewProxyPreparedStatement.java:495) ~[c3p0-0.9.1.2.jar:0.9.1.2]
at org.hibernate.type.BlobType.set(BlobType.java:72) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.type.BlobType.nullSafeSet(BlobType.java:140) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.persister.entity.AbstractEntityPersister.dehydrate(AbstractEntityPersister.java:2025) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2271) ~[hibernate-core-3.3.1.GA.jar:3.3.1.GA]
... 160 common frames omitted
Though jbpm 4.4 is a very old version (currently at 7.54), try to update your schema and use bytea type for postgresql large objects.
If you're using JTA datasource, auto-commit setting is always true and it can not be changed. Try to change it to an xa-datasource
Solution:
I was able to find out a solution for this.
I have changed the DB column to Bytea type in PostgreSQL and change the JBPM 4.4 implementation to use byte[] over java.sql.Blob (in org.jbpm.pvm.internal.lob.Lob class).
I am new to pyspark and so far it is a bit difficult to understand the way it works specially when you get use to libraries like pandas. But it is seems the way to go for big data.
For my current ETL job, I have the following elements:
This is my rdd:
[
[
('SMSG', 'BKT'), ('SQNR', '00000004'), ('STNQ', '06'), ('TRNN', '000001'), ('SMSG', 'BKS'), ('SQNR', '00000005'), ('STNQ', '24'), ('DAIS', '171231'), ('TRNN', '000001'), ....
],
[
('SMSG', 'BKT'), ('SQNR', '00000024'), ('STNQ', '06'), ('TRNN', '000002'), ('NRID', ' '), ('TREC', '020'), ('TRNN', '000002'), ('NRID', ' '), ('TACN', '001'), ('CARF', ' '), ...
],
...
]
The row data is a fixed size text file.
what I want to do now is to groupByKey each cell of the list.
final result should be:
[
[
('SMSG_1', 'BKT'),('SMSG_2','BKS'),('SQNR_1', '00000004'),('SQNR_2', '00000005'),('STNQ_1','06'),('STNQ_2','24'),('TRNN', '000001'),()('DAIS', '171231'),...
],
[
('SMSG', 'BKT'),('SQNR', '00000024'),('STNQ','06'),('TRNN', '000002'),('NRID', ' '), ('TREC', '020'), ('TACN', '001'), ('CARF', ' '),...
],
...
]
Basically the rules are as following:
1- if the keys are same and the values are also same remove duplicates.
2- if the keys are same and the values different, rename columns and add a suffix as "_Number" where Number can be replaced by the iteration number of that key.
My code start as following:
def addBKT():
...
def prepareTrans():
...
if __name__ == '__main__':
input_folder = '/Users/admin/Documents/Training/FR20180101HOT'
rdd = sc.wholeTextFiles(input_folder).map(lambda x: x[1].split("BKT"))
rdd = rdd.flatMap(prepareTrans).map(addBKT).map(lambda x: x.split("\n")).map(hot_to_flat_file_v2)
print(rdd.take(1))
The print give me (as shared before) the following list of lists of tuples. I am taking only 1 sublist but the full rdd has about 2000 sublists of tuples:
[
[
('SMSG', 'BKT'), ('SQNR', '00000004'), ('STNQ', '06'), ('TRNN', '000001'), ('SMSG', 'BKS'), ('SQNR', '00000005'), ('STNQ', '24'), ('DAIS', '171231'), ('TRNN', '000001'), ....
]
]
I tried to reduce first the nested lists as following:
rdd = rdd.flatMap(lambda x:x).reduceByKey(list)
I was expecting as result a new list of lists without duplicates and for the tuples with different values, group them all under the same key. However, I am not able to do that.
As second step, I was planning to transform tuples with multiple values to new pairs of tuples as much as I got values in the grouped tuple: i.e. ('Key', ['Value1', 'Value2']) become ('Key_1', 'Value1'),('Key_2', 'Value2')
Finally, the output of all these transformations is to convert the final RDD to a DataFrame and store it in parquet format.
I really hope someone did something like that in the past. I took a lot of time to try doing this but I am not able to make it nor I was able to find any example online.
Thank you for your help.
As you are new to spark, You may not aware of Spark Dataframe. Dataframe is advanced concept compared to RDD. Here I solved your problem using Pyspark Dataframe. Have a look into this, Dont hesitate to learn spark Dataframe.
rdd1 = sc.parallelize([("SMSG", "BKT"), ("SMSG", "BKT"), ("SMSG", "BKS"), ('SQNR', '00000004'), ('SQNR', '00000005') ])
rddToDF = rdd1.toDF(["C1", "C2"])
+----+--------+
| C1| C2|
+----+--------+
|SMSG| BKT|
|SMSG| BKT|
|SMSG| BKS|
|SQNR|00000004|
|SQNR|00000005|
+----+--------+
DfRmDup = rddToDF.drop_duplicates() #Removing duplicates from Dataframe
DfRmDup.show()
+----+--------+
| C1| C2|
+----+--------+
|SQNR|00000004|
|SMSG| BKT|
|SQNR|00000005|
|SMSG| BKS|
+----+--------+
rank = DfRmDup.withColumn("rank", dense_rank().over(Window.partitionBy("C1").orderBy(asc("C2"))))
rank.show()
+----+--------+----+
| C1| C2|rank|
+----+--------+----+
|SQNR|00000004| 1|
|SQNR|00000005| 2|
|SMSG| BKS| 1|
|SMSG| BKT| 2|
+----+--------+----+
rank.withColumn("C1", concat(col("C1"), lit("_"), col("rank"))).drop("rank").show()
+------+--------+
| C1| C2|
+------+--------+
|SQNR_1|00000004|
|SQNR_2|00000005|
|SMSG_1| BKS|
|SMSG_2| BKT|
+------+--------+
#Converting back to RDD
rank.withColumn("C1", concat(col("C1"), lit("_"), col("rank"))).drop("rank").rdd.map(lambda x: (x[0],x[1])).collect()
[('SQNR_1', '00000004'),
('SQNR_2', '00000005'),
('SMSG_1', 'BKS'),
('SMSG_2', 'BKT')]
Thank you a lot for the link, I follwed the solution provided. The dataframe got created successfully, which is great.
input_folder = '/Users/admin/Documents/Training/FR20180101HOT'
rdd_split = sc.wholeTextFiles(input_folder).map(lambda x: x[1].split("BKT"))
rdd_trans = rdd_split.flatMap(prepareTrans).map(addBKT).map(lambda x: x.split("\n")).map(hot_to_flat_file_v2)
#rdd_group = rdd_trans.map(lambda x : x[i] for i in range(len(x))).reduceByKey(lambda x, y: str(x) + ','+ str(y))
df = spark.read.options(inferSchema="true").csv(rdd_trans)
print(df.show(1))
The print show me something like that:
+--------+-------+--------+------------+--------+------+--------+----------+----...
| _c0| _c1| _c2| _c3| _c4| _c5| _c6| _c7| _c8| _c9| _c10| _c11| _c12| _c13| _c14| _c15| _c16| _c17| _c18| _c19| _c20| _c21| _c22| _c23| _c24| _c25| _c26| _c27| _c28| _c29| _c30| _c31| _c32| _c33| _c34| _c35| _c36| _c37| _c38| _c39| _c40| _c41| _c42| _c43| _c44| _c45| _c46| _c47| _c48| _c49| _c50| _c51| _c52| _c53| _c54| _c55| _c56| _c57| _c58| _c59| _c60| _c61| _c62| _c63| _c64| _c65| _c66| _c67| _c68| _c69| _c70| _c71| _c72| _c73| _c74| _c75| _c76| _c77| _c78| _c79| _c80| _c81| _c82| _c83| _c84| _c85| _c86| _c87| _c88| _c89| _c90| _c91| _c92| _c93| _c94| _c95| _c96| _c97| _c98| _c99| _c100| _c101| _c102| _c103| _c104| _c105| _c106| _c107| _c108| _c109| _c110| _c111| _c112| _c113| _c114|_c115| _c116|_c117| _c118|_c119| _c120| _c121| _c122| _c123| _c124| _c125| _c126| _c127| _c128| _c129| _c130| _c131| _c132|_c133| _c134| _c135| _c136| _c137| _c138| _c139| _c140| _c141| _c142| _c143| _c144| _c145| _c146| _c147| _c148| _c149| _c150|_c151| _c152|_c153| _c154|_c155| _c156| _c157| _c158| _c159| _c160| _c161| _c162|_c163| _c164| _c165| _c166| _c167| _c168|_c169| _c170| _c171| _c172| _c173| _c174| _c175| _c176| _c177| _c178| _c179| _c180| _c181| _c182| _c183| _c184| _c185| _c186|_c187| _c188|_c189| _c190|_c191| _c192| _c193| _c194| _c195| _c196| _c197| _c198| _c199| _c200| _c201| _c202| _c203| _c204|_c205| _c206| _c207| _c208| _c209| _c210| _c211| _c212| _c213| _c214| _c215| _c216| _c217| _c218| _c219| _c220| _c221| _c222|_c223| _c224|_c225| _c226|_c227| _c228| _c229| _c230| _c231| _c232| _c233| _c234| _c235| _c236| _c237| _c238| _c239| _c240|_c241| _c242| _c243| _c244| _c245| _c246| _c247| _c248| _c249| _c250| _c251| _c252| _c253| _c254| _c255| _c256| _c257| _c258|_c259| _c260|_c261| _c262|_c263| _c264| _c265| _c266| _c267| _c268| _c269| _c270|_c271| _c272| _c273| _c274|_c275| _c276|_c277| _c278| _c279| _c280| _c281| _c282| _c283| _c284| _c285| _c286| _c287| _c288| _c289| _c290| _c291| _c292| _c293| _c294|_c295| _c296| _c297| _c298| _c299| _c300| _c301| _c302|_c303| _c304| _c305| _c306| _c307| _c308|_c309| _c310| _c311| _c312|_c313| _c314|_c315| _c316|_c317| _c318| _c319| _c320| _c321| _c322| _c323| _c324| _c325| _c326| _c327| _c328| _c329| _c330| _c331| _c332| _c333| _c334|_c335| _c336| _c337| _c338| _c339| _c340| _c341| _c342| _c343| _c344| _c345| _c346| _c347| _c348| _c349| _c350| _c351| _c352| _c353| _c354| _c355| _c356| _c357| _c358| _c359| _c360|_c361| _c362|_c363| _c364|_c365| _c366| _c367| _c368| _c369| _c370| _c371| _c372| _c373| _c374| _c375| _c376|_c377| _c378| _c379| _c380| _c381| _c382| _c383| _c384| _c385| _c386| _c387| _c388| _c389| _c390| _c391| _c392| _c393| _c394| _c395| _c396| _c397| _c398| _c399| _c400| _c401| _c402| _c403| _c404| _c405| _c406| _c407| _c408| _c409| _c410|_c411| _c412|_c413| _c414|_c415| _c416| _c417| _c418| _c419| _c420| _c421| _c422| _c423| _c424| _c425| _c426|_c427| _c428| _c429| _c430| _c431| _c432| _c433| _c434| _c435| _c436| _c437| _c438| _c439| _c440| _c441| _c442| _c443| _c444| _c445| _c446| _c447| _c448| _c449| _c450| _c451| _c452| _c453| _c454| _c455| _c456| _c457| _c458| _c459| _c460|_c461| _c462|_c463| _c464|_c465| _c466| _c467| _c468| _c469| _c470| _c471| _c472| _c473| _c474| _c475| _c476|_c477| _c478| _c479| _c480| _c481| _c482| _c483| _c484| _c485| _c486| _c487| _c488| _c489| _c490| _c491| _c492| _c493| _c494| _c495| _c496| _c497| _c498| _c499| _c500| _c501| _c502| _c503| _c504| _c505| _c506| _c507| _c508| _c509| _c510|_c511| _c512|_c513| _c514|_c515| _c516| _c517| _c518| _c519| _c520| _c521| _c522| _c523| _c524| _c525| _c526|_c527| _c528| _c529| _c530| _c531| _c532| _c533| _c534| _c535| _c536| _c537| _c538| _c539| _c540| _c541| _c542| _c543| _c544| _c545| _c546| _c547| _c548| _c549| _c550| _c551| _c552| _c553| _c554| _c555| _c556| _c557| _c558| _c559| _c560|_c561| _c562| _c563| _c564|_c565| _c566| _c567| _c568| _c569| _c570| _c571| _c572|_c573| _c574| _c575| _c576|_c577| _c578|_c579| _c580| _c581| _c582| _c583| _c584| _c585| _c586| _c587| _c588| _c589| _c590| _c591| _c592| _c593| _c594| _c595| _c596|_c597| _c598| _c599| _c600| _c601| _c602| _c603| _c604| _c605| _c606| _c607| _c608| _c609| _c610| _c611| _c612| _c613| _c614| _c615| _c616| _c617| _c618| _c619| _c620|_c621| _c622|_c623| _c624| _c625| _c626| _c627| _c628| _c629| _c630| _c631| _c632| _c633| _c634| _c635| _c636| _c637| _c638| _c639| _c640|_c641| _c642|_c643| _c644| _c645| _c646| _c647| _c648| _c649| _c650| _c651| _c652| _c653| _c654| _c655| _c656| _c657| _c658| _c659| _c660|_c661| _c662|_c663| _c664| _c665| _c666| _c667| _c668| _c669| _c670| _c671| _c672| _c673| _c674| _c675| _c676| _c677| _c678| _c679| _c680| _c681| _c682| _c683| _c684| _c685| _c686| _c687| _c688| _c689| _c690| _c691| _c692| _c693| _c694| _c695| _c696|_c697| _c698| _c699| _c700| _c701|
+--------+-------+--------+------------+--------+------+--------+----------+-------...
|[('SMSG'| 'BKT')| ('SQNR'| '00000004')| ('STNQ'| '06')| ('TRNN'| '000001')| ('NRID'| ' ')| ('TREC'| '020')| ('TACN'| '001')| ('CARF'| ' ')| ('CSTF'| ' ...| ('RPSI'| 'SABR')| ('ESAC'| ' ')| ('DISI'| ' ')| ('NRMI'| ' ')| ('NRCT'| ' ')| ('AREI'| ' ')| ('RESD'| ' ...| ('SMSG'| 'BKS')| ('SQNR'| '00000005')| ('STNQ'| '24')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('CPUI'| 'FFFF')| ('CJCP'| ' ')| ('AGTN'| '20212146')| ('RFIC'| ' ')| ('TOUR'| ' ')| ('TRNC'| 'TKTT')| ('TODC'| 'CDGCDG ')| ('PNRR'| 'IKQOWZ/AA ')| ('TIIS'| '0000')| ('RESD'| ' ...| ('SMSG'| 'BKS')| ('SQNR'| '00000006')| ('STNQ'| '30')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('COBL'| 225.0)| ('NTFA'| 0.0)| ('TMFT_1'| 'YR ')| ('TMFA_1'| 300.0)| ('TMFT_2'| 'FR ')| ('TMFA_2'| 20.81)| ('TMFT_3'| 'QX ')| ('TMFA_3'| 27.91)| ('TDAM'| 712.92)| ('RESD'| ' ')| ('CUTP'| 'EUR2')| ('SMSG'| 'BKS')| ('SQNR'| '00000007')| ('STNQ'| '30')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('COBL'| 0.0)| ('NTFA'| 0.0)| ('TMFT_1'| 'IZ ')| ('TMFA_1'| 4.51)| ('TMFT_2'| 'YC ')| ('TMFA_2'| 9.22)| ('TMFT_3'| 'XY ')| ('TMFA_3'| 11.74)| ('TDAM'| 0.0)| ('RESD'| ' ')| ('CUTP'| 'EUR2')| ('SMSG'| 'BKS')| ('SQNR'| '00000008')| ('STNQ'| '30')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('COBL'| 0.0)| ('NTFA'| 0.0)| ('TMFT_1'| 'XA ')| ('TMFA_1'| 6.64)| ('TMFT_2'| 'AY ')| ('TMFA_2'| 9.4)| ('TMFT_3'| 'WD ')| ('TMFA_3'| 29.33)| ('TDAM'| 0.0)| ('RESD'| ' ')| ('CUTP'| 'EUR2')| ('SMSG'| 'BKS')| ('SQNR'| '00000009')| ('STNQ'| '30')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('COBL'| 0.0)| ('NTFA'| 0.0)| ('TMFT_1'| 'EK ')| ('TMFA_1'| 18.89)| ('TMFT_2'| 'EL ')| ('TMFA_2'| 4.19)| ('TMFT_3'| 'HG ')| ('TMFA_3'| 16.76)| ('TDAM'| 0.0)| ('RESD'| ' ')| ('CUTP'| 'EUR2')| ('SMSG'| 'BKS')| ('SQNR'| '00000010')| ('STNQ'| '30')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('COBL'| 0.0)| ('NTFA'| 0.0)| ('TMFT_1'| 'JT ')| ('TMFA_1'| 2.52)| ('TMFT_2'| 'UC ')| ('TMFA_2'| 6.72)| ('TMFT_3'| 'QK ')| ('TMFA_3'| 16.76)| ('TDAM'| 0.0)| ('RESD'| ' ')| ('CUTP'| 'EUR2')| ('SMSG'| 'BKS')| ('SQNR'| '00000011')| ('STNQ'| '30')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('COBL'| 0.0)| ('NTFA'| 0.0)| ('TMFT_1'| 'XF ')| ('TMFA_1'| 2.52)| ('TMFT_2'| 'XFCLT3 ')| ('TMFA_2'| 0.0)| ('TMFT_3'| ' ')| ('TMFA_3'| 0.0)| ('TDAM'| 0.0)| ('RESD'| ' ')| ('CUTP'| 'EUR2')| ('SMSG'| 'BKS')| ('SQNR'| '00000012')| ('STNQ'| '39')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('STAT'| 'I ')| ('COTP'| ' ')| ('CORT'| '00000')| ('COAM'| 0.0)| ('SPTP'| ' ')| ('SPRT'| '00000')| ('SPAM'| 0.0)| ('EFRT'| '00000')| ('EFCO'| 0.0)| ('APBC'| 0.0)| ('RDII'| ' ')| ('RESD'| ' ...| ('CUTP'| 'EUR2')| ('SMSG'| 'BKS')| ('SQNR'| '00000013')| ('STNQ'| '46')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('ORIT'| ' ')| ('ORIL'| ' ')| ('ORID'| ' ')| ('ORIA'| '00000000')| ('ENRS'| 'NONREF/RESTRICT...| ('RESD'| ' ')| ('SMSG'| 'BKI')| ('SQNR'| '00000014')| ('STNQ'| '63')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('SEGI'| '1')| ('STPO'| 'X')| ('NBDA'| '22APR')| ('NADA'| '22APR')| ('ORAC'| 'CDG ')| ('DSTC'| 'MIA ')| ('CARR'| 'AA ')| ('CABI'| ' ')| ('FTNR'| ' 63 ')| ('RBKD'| 'O ')| ('FTDA'| '22APR')| ('FTDT'| '1155 ')| ('FBST'| 'OK')| ('FBAL'| '1PC')| ('FBTD'| 'OLN0DMN3 ')| ('FFRF'| ' ...| ('FCPT'| ' ')| ('RESD'| ' ')| ('SMSG'| 'BKI')| ('SQNR'| '00000015')| ('STNQ'| '63')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('SEGI'| '2')| ('STPO'| 'O')| ('NBDA'| '22APR')| ('NADA'| '22APR')| ('ORAC'| 'MIA ')| ('DSTC'| 'MBJ ')| ('CARR'| 'AA ')| ('CABI'| ' ')| ('FTNR'| '1515 ')| ('RBKD'| 'O ')| ('FTDA'| '22APR')| ('FTDT'| '1801 ')| ('FBST'| 'OK')| ('FBAL'| '1PC')| ('FBTD'| 'OLN0DMN3 ')| ('FFRF'| ' ...| ('FCPT'| ' ')| ('RESD'| ' ')| ('SMSG'| 'BKI')| ('SQNR'| '00000016')| ('STNQ'| '63')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('SEGI'| '3')| ('STPO'| 'X')| ('NBDA'| '29APR')| ('NADA'| '29APR')| ('ORAC'| 'MBJ ')| ('DSTC'| 'CLT ')| ('CARR'| 'AA ')| ('CABI'| ' ')| ('FTNR'| ' 844 ')| ('RBKD'| 'O ')| ('FTDA'| '29APR')| ('FTDT'| '1059 ')| ('FBST'| 'OK')| ('FBAL'| '1PC')| ('FBTD'| 'OLN0DMN3 ')| ('FFRF'| ' ...| ('FCPT'| ' ')| ('RESD'| ' ')| ('SMSG'| 'BKI')| ('SQNR'| '00000017')| ('STNQ'| '63')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('SEGI'| '4')| ('STPO'| ' ')| ('NBDA'| '29APR')| ('NADA'| '29APR')| ('ORAC'| 'CLT ')| ('DSTC'| 'CDG ')| ('CARR'| 'AA ')| ('CABI'| ' ')| ('FTNR'| ' 786 ')| ('RBKD'| 'O ')| ('FTDA'| '29APR')| ('FTDT'| '1630 ')| ('FBST'| 'OK')| ('FBAL'| '1PC')| ('FBTD'| 'OLN0DMN3 ')| ('FFRF'| ' ...| ('FCPT'| ' ')| ('RESD'| ' ')| ('SMSG'| 'BAR')| ('SQNR'| '00000018')| ('STNQ'| '64')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('FARE'| 'EUR 225.00')| ('TKMI'| '/')| ('EQFR'| ' ')| ('TOTL'| 'EUR 712.92')| ('SASI'| '0011')| ('FCMI'| '0')| ('BAID'| ' ')| ('BEOT'| ' ')| ('FCPI'| '0')| ('AENT'| ' ')| ('RESD'| ' ...| ('SMSG'| 'BAR')| ('SQNR'| '00000019')| ('STNQ'| '65')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('PXNM'| ' ...| ('PXDA'| ' ...| ('DOBR'| '02APR68')| ('PXTP'| ' ')| ('RESD'| ' ')| ('SMSG'| 'BAR')| ('SQNR'| '00000020')| ('STNQ'| '66')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('FPSN'| '1')| ('FPIN'| 'AA132193 ...| ('RESD'| ' ...| ('SMSG'| 'BKF')| ('SQNR'| '00000021')| ('STNQ'| '81')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('FRCS'| '1')| ('FRCA'| 'PAR AA X/MIA AA...| ('RESD'| ' ')| ('SMSG'| 'BKF')| ('SQNR'| '00000022')| ('STNQ'| '81')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('TDNR'| '0015131574285 ')| ('CDGT'| '2')| ('FRCS'| '2')| ('FRCA'| '1IZ9.22YC11.74X...| ('RESD'| ' ')| ('SMSG'| 'BKP')| ('SQNR'| '00000023')| ('STNQ'| '84')| ('DAIS'| '171231')| ('TRNN'| '000001')| ('FPTP'| 'CA ')| ('FPAM'| 712.92)| ('FPAC'| ' ...| ('EXDA'| ' ')| ('EXPC'| ' ')| ('APLC'| ' ')| ('INVN'| ' ')| ('INVD'| '000000')| ('REMT'| 712.92)| ('CVVR'| ' ')| ('RESD'| ' ...| ('CUTP'| 'EUR2')]|
+--------+-------+--------+------------+--------+------+--------+----------+-------...
I think I still need to go through each pair of columns, rename the second column with the value of the first row of first column and finally, drop all first columns of every pairs of columns.
Or is it possible to add more options to:
df = spark.read.options(inferSchema="true").csv(rdd_trans)
to get the exact correct dataframe structure? It will avoid more processing time (my goal is to be faster than in pandas version)
In the mean time, I tried to do:
df.write.parquet("/Users/admin/Documents/Training/FR20180101HOT.parquet")
But got an error:
Py4JJavaError: An error occurred while calling o447851.parquet.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:196)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
...
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8220.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8220.0 (TID 12712, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows.
...
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
...
I can't put all the error message due to limit of text but it seems related to memory issue.
I did a count for the df:
print(df.count())
15723
Which is equal to the number of rows in my pandas version (other python code not using pyspark) so it is getting the right number of rows. However, in pandas I am able to extract in parquet without a problem.
You can try regexp_replace for your case.
Check the example case below,
df1.withColumn("c0", regexp_replace("_c0", "[()']", "")).withColumn("c1", regexp_replace("_c1", "\)", "")).show()
+----+---+---+---+
| _c0|_c1| c0| c1|
+----+---+---+---+
|('a'| 2)| a| 2|
|('b'| 4)| b| 4|
|('c'| 6)| c| 6|
+----+---+---+---+
I'm tagging text to search for nouns and adjectives:
text = u"""Developed at the Vaccine and Gene Therapy Institute at the Oregon Health and Science University (OHSU), the vaccine proved successful in about fifty percent of the subjects tested and could lead to a human vaccine preventing the onset of HIV/AIDS and even cure patients currently on anti-retroviral drugs."""
nltk.pos_tag(nltk.word_tokenize(text))
This results in:
[('Developed', 'NNP'), ('at', 'IN'), ('the', 'DT'), ('Vaccine',
'NNP'), ('and', 'CC'), ('Gene', 'NNP'), ('Therapy', 'NNP'),
('Institute', 'NNP'), ('at', 'IN'), ('the', 'DT'), ('Oregon', 'NNP'),
('Health', 'NNP'), ('and', 'CC'), ('Science', 'NNP'), ('University',
'NNP'), ('(', 'NNP'), ('OHSU', 'NNP'), (')', 'NNP'), (',',
','), ('the', 'DT'), ('vaccine', 'NN'), ('proved', 'VBD'),
('successful', 'JJ'), ('in', 'IN'), ('about', 'IN'), ('fifty', 'JJ'),
('percent', 'NN'), ('of', 'IN'), ('the', 'DT'), ('subjects', 'NNS'),
('tested', 'VBD'), ('and', 'CC'), ('could', 'MD'), ('lead', 'VB'),
('to', 'TO'), ('a', 'DT'), ('human', 'NN'), ('vaccine', 'NN'),
('preventing', 'VBG'), ('the', 'DT'), ('onset', 'NN'), ('of', 'IN'),
('HIV/AIDS', 'NNS'), ('and', 'CC'), ('even', 'RB'), ('cure', 'NN'),
('patients', 'NNS'), ('currently', 'RB'), ('on', 'IN'),
('anti-retroviral', 'JJ'), ('drugs', 'NNS'), ('.', '.')]
Is there a built in way of correctly detecting parenthesis when tagging sentences?
If you know what you want to return as the tag value for the parens, then you can use a RegexpTagger to match the parens and fallback to the preferred tagger for all else.
import nltk
from nltk.data import load
_POS_TAGGER = 'taggers/maxent_treebank_pos_tagger/english.pickle'
tagger = load(_POS_TAGGER) # same tagger as using nltk.pos_tag
regexp_tagger = nltk.tag.RegexpTagger([(r'\(|\)', '--')], backoff = tagger)
regexp_tagger.tag(nltk.word_tokenize(text))
Result:
[(u'Developed', 'NNP'), (u'at', 'IN'), (u'the', 'DT'), (u'Vaccine',
'NNP'), (u'and', 'CC'), (u'Gene', 'NNP'), (u'Therapy', 'NNP'),
(u'Institute', 'NNP'), (u'at', 'IN'), (u'the', 'DT'), (u'Oregon',
'NNP'), (u'Health', 'NNP'), (u'and', 'CC'), (u'Science', 'NNP'),
(u'University', 'NNP'), (u'(', '--'), (u'OHSU', 'NNP'), (u')', '--'),
(u',', ','), (u'the', 'DT'), (u'vaccine', 'NN'), (u'proved', 'VBD'),
(u'successful', 'JJ'), (u'in', 'IN'), (u'about', 'IN'), (u'fifty',
'JJ'), (u'percent', 'NN'), (u'of', 'IN'), (u'the', 'DT'),
(u'subjects', 'NNS'), (u'tested', 'VBD'), (u'and', 'CC'), (u'could',
'MD'), (u'lead', 'VB'), (u'to', 'TO'), (u'a', 'DT'), (u'human', 'NN'),
(u'vaccine', 'NN'), (u'preventing', 'VBG'), (u'the', 'DT'), (u'onset',
'NN'), (u'of', 'IN'), (u'HIV/AIDS', 'NNS'), (u'and', 'CC'), (u'even',
'RB'), (u'cure', 'NN'), (u'patients', 'NNS'), (u'currently', 'RB'),
(u'on', 'IN'), (u'anti-retroviral', 'JJ'), (u'drugs', 'NNS'), (u'.',
'.')]
I have to resample the following cell array:
dateS =
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'
following an irregular spacing, e.g. between 1st and 2nd rows there are 5 readings, while between 2 and 3rd there are 10. The number of intermediates 'readings' are stored in a vector 'v'. So, what I need is a new vector with all the intermediate dates/times in the same format at dateS.
EDIT:
There's 1h30min = 90min between the first 2 readings in the list. Five intervals b/w them amounts to 90 mins / 5 = 18 mins. Now insert five 'readings' between (1) and (2), each separated by 18mins. I need to do that for all the dateS.
Any ideas? Thanks!
You can interpolate the serial dates with interp1():
% Inputs
dates = [
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'];
v = [5 4 3 2 4 5 3];
% Serial dates
serdates = datenum(dates,'yyyy-mm-dd HH:MM:SS');
% Interpolate
x = cumsum([1 v]);
resampled = interp1(x, serdates, x(1):x(end))';
The result:
datestr(resampled)
ans =
02-Sep-2004 06:00:00
02-Sep-2004 06:18:00
02-Sep-2004 06:36:00
02-Sep-2004 06:54:00
02-Sep-2004 07:12:00
02-Sep-2004 07:30:00
02-Sep-2004 08:37:30
02-Sep-2004 09:45:00
02-Sep-2004 10:52:30
02-Sep-2004 12:00:00
02-Sep-2004 14:00:00
02-Sep-2004 16:00:00
02-Sep-2004 18:00:00
02-Sep-2004 18:45:00
02-Sep-2004 19:30:00
02-Sep-2004 20:37:30
02-Sep-2004 21:45:00
02-Sep-2004 22:52:30
03-Sep-2004 00:00:00
03-Sep-2004 01:06:00
03-Sep-2004 02:12:00
03-Sep-2004 03:18:00
03-Sep-2004 04:24:00
03-Sep-2004 05:30:00
03-Sep-2004 05:40:00
03-Sep-2004 05:50:00
03-Sep-2004 06:00:00
The following code does what you want (I picked arbitrary values for v - as long as the number of elements in vector v is one less than the number of entries in dateS this should work):
dateS = [
'2004-09-02 06:00:00'
'2004-09-02 07:30:00'
'2004-09-02 12:00:00'
'2004-09-02 18:00:00'
'2004-09-02 19:30:00'
'2004-09-03 00:00:00'
'2004-09-03 05:30:00'
'2004-09-03 06:00:00'];
% "stations":
v = [6 5 4 3 5 6 4];
dn = datenum(dateS);
df = diff(dn)'./v;
newDates = [];
for ii = 1:numel(v)
newDates = [newDates dn(ii) + (0:v(ii))*df(ii)];
end
newStrings = datestr(newDates, 'yyyy-mm-dd HH:MM:SS');
The array newStrings ends up containing the following: for example, you can see that the interval between the first and second time has been split into 6 15 minute segments
2004-09-02 06:00:00
2004-09-02 06:15:00
2004-09-02 06:30:00
2004-09-02 06:45:00
2004-09-02 07:00:00
2004-09-02 07:15:00
2004-09-02 07:30:00
2004-09-02 08:24:00
2004-09-02 09:18:00
2004-09-02 10:12:00
2004-09-02 11:06:00
2004-09-02 12:00:00
2004-09-02 13:30:00
2004-09-02 15:00:00
2004-09-02 16:30:00
2004-09-02 18:00:00
2004-09-02 18:30:00
2004-09-02 19:00:00
2004-09-02 19:30:00
2004-09-02 20:24:00
2004-09-02 21:18:00
2004-09-02 22:12:00
2004-09-02 23:06:00
2004-09-03 00:00:00
2004-09-03 00:55:00
2004-09-03 01:50:00
2004-09-03 02:45:00
2004-09-03 03:40:00
2004-09-03 04:35:00
2004-09-03 05:30:00
2004-09-03 05:37:30
2004-09-03 05:45:00
2004-09-03 05:52:30
The code relies on a few concepts:
A date can be represented as a string or a datenum. I use built in functions to go between them
Once you have the date/time as a number, it is easy to interpolate
I use the diff function to find the difference between successive times
I don't attempt to "vectorize" the code - you were not asking for efficient code, and for an example like this the clarity of a for loop trumps everything.