Scala: how to get up to first n characters from a large string? - scala

Using Scala, I am grabbing a json response object from a web API and storing the response as a string s. This string is at least several kilobytes. Because sometimes this response can provide some funky stuff hinting at errors or issues with the API I want to print out a preview of the response to our logs. That way I can see the log and tell that the job is either running successfully or has failed. Is there an efficient and safe way to grab the first 100 or so characters from a string? The string may occasionally be very small so grabbing via a slice I think will cause an index out of range issue.
val n = 100
val myString: String = getResponseAsString()//returns small or very large string
logger.warn(s"Data: $myString") //how to print only first 'n' chars?

"long string".take(4) // "long"
"x".take(4) // "x"
take

val n :Int = ...
val myString :String = ...
logger.warn(s"Data: %.${n}s" format myString)

Related

String transformation for subject course code for Dart/Flutter

For interaction with an API, I need to pass the course code in <string><space><number> format. For example, MCTE 2333, CCUB 3621, BTE 1021.
Yes, the text part can be 3 or 4 letters.
Most users enter the code without the space, eg: MCTE2333. But that causes error to the API. So how can I add a space between string and numbers so that it follows the correct format.
You can achieve the desired behaviour by using regular expressions:
void main() {
String a = "MCTE2333";
String aStr = a.replaceAll(RegExp(r'[^0-9]'), ''); //extract the number
String bStr = a.replaceAll(RegExp(r'[^A-Za-z]'), ''); //extract the character
print("$bStr $aStr"); //MCTE 2333
}
Note: This will produce the same result, regardless of how many whitespaces your user enters between the characters and numbers.
Try this.You have to give two texfields. One is for name i.e; MCTE and one is for numbers i.e; 1021. (for this textfield you have to change keyboard type only number).
After that you can join those string with space between them and send to your DB.
It's just like hack but it will work.
Scrolling down the course codes list, I noticed some unusual formatting.
Example: TQB 1001E, TQB 1001E etc. (With extra letter at the end)
So, this special format doesn't work with #Jahidul Islam's answer. However, inspired by his answer, I manage to come up with this logic:
var code = "TQB2001M";
var i = course.indexOf(RegExp(r'[^A-Za-z]')); // get the index
var j = course.substring(0, i); // extract the first half
var k = course.substring(i).trim(); // extract the others
var formatted = '$j $k'.toUpperCase(); // combine & capitalize
print(formatted); // TQB 1011M
Works with other formats too. Check out the DartPad here.
Here is the entire logic you need (also works for multiple whitespaces!):
void main() {
String courseCode= "MMM 111";
String parsedCourseCode = "";
if (courseCode.contains(" ")) {
final ensureSingleWhitespace = RegExp(r"(?! )\s+| \s+");
parsedCourseCode = courseCode.split(ensureSingleWhitespace).join(" ");
} else {
final r1 = RegExp(r'[0-9]', caseSensitive: false);
final r2 = RegExp(r'[a-z]', caseSensitive: false);
final letters = courseCode.split(r1);
final numbers = courseCode.split(r2);
parsedCourseCode = "${letters[0].trim()} ${numbers.last}";
}
print(parsedCourseCode);
}
Play around with the input value (courseCode) to test it - also use dart pad if you want. You just have to add this logic to your input value, before submitting / handling the input form of your user :)

Roblox- how to store large arrays in roblox datastores

i am trying to make a game where players create their own buildings and can then save them for other players to see and play on. However, roblox doesn't let me store all the data needed for the whole creation(there are several properties for each brick)
All i get is this error code:
104: Cannot store Array in DataStore
any help would be greatly appreciated!
I'm not sure if this is the best method, but it's my attempt. Below is an example of a table, you can use tables to store several values. I think you can use HttpService's JSONEncode function to convert tables into strings (which hopefully can be saved more efficiently)
JSONEncode (putting brick's data into a string, which you can save into the DataStore
local HttpService = game:GetService("HttpService")
-- this is an example of what we'll convert into a json string
local exampleBrick = {
["Size"] = Vector3.new(3,3,3),
["Position"] = Vector3.new(0,1.5,0),
["BrickColor"] = BrickColor.new("White")
["Material"] = "Concrete"
}
local brickJSON = HttpService:JSONEncode(exampleBrick)
print(brickJSON)
-- when printed, you'll get something like
-- { "Size": Vector3.new(3,3,3), "Position": Vector3.new(0,1.5,0), "BrickColor": BrickColor.new("White"), "Material": "Concrete"}
-- if you want to refer to this string in a script, surround it with two square brackets ([[) e.g. [[{"Size": Vector3.new(3,3,3)... }]]
JSONDecode (reading the string and converting it back into a brick)
local HttpService = game:GetService("HttpService")
local brickJSON = [[ {"Size": Vector3.new(3,3,3), "Position": Vector3.new(0,1.5,0), "BrickColor": BrickColor.new("White"), "Material": "Concrete"} ]]
function createBrick(tab)
local brick = Instance.new("Part")
brick.Parent = <insert parent here>
brick.Size = tab[1]
brick.Position= tab[2]
brick.BrickColor= tab[3]
brick.Material= tab[4]
end
local brickData = HttpService:JSONDecode(brickJSON)
createBrick(brickData) --this line actually spawns the brick
The function can also be wrapped in a pcall if you want to account for any possible datastore errors.
Encoding a whole model into a string
Say your player's 'building' is a model, you can use the above encode script to convert all parts inside a model into a json string to save.
local HttpService = game:GetService("HttpService")
local StuffWeWantToSave = {}
function getPartData(part)
return( {part.Size,part.Position,part.BrickColor,part.Material} )
end
local model = workspace.Building --change this to what the model is
local modelTable = model:Descendants()
for i,v in pairs(modelTable) do
if v:IsA("Part") or v:IsA("WedgePart") then
table.insert(StuffWeWantToSave, HttpService:JSONEncode(getPartData(modelTable[v])))
end
end
Decoding a string into a whole model
This will probably occur when the server is loading a player's data.
local HttpService = game:GetService("HttpService")
local SavedStuff = game:GetService("DataStoreService"):GetDataStore("blabla") --I don't know how you save your data, so you'll need to adjust this and the rest of the scripts (as long as you've saved the string somewhere in the player's DataStore)
function createBrick(tab)
local brick = Instance.new("Part")
brick.Parent = <insert parent here>
brick.Size = tab[1]
brick.Position= tab[2]
brick.BrickColor= tab[3]
brick.Material= tab[4]
end
local model = Instance.new("Model") --if you already have 'bases' for the players to load their stuff in, remove this instance.new
model.Parent = workspace
for i,v in pairs(SavedStuff) do
if v[1] ~= nil then
CreateBrick(v)
end
end
FilteringEnabled
If your game uses filteringenabled, make sure that only the server handles saving and loading data!! (you probably already knew that) If you want the player to save by clicking a gui button, make the gui button fire a RemoteFunction that sends their base's data to the server to convert it to a string.
BTW I'm not that good at scripting so I've probably made a mistake somehwere.. good luck though
Crabway's answer is correct in that the HttpService's JSONEncode and JSONDecode methods are the way to go about tackling this problem. As it says on the developer reference page for the DataStoreService, Data is ... saved as a string in data stores, regardless of its initial type. (https://developer.roblox.com/articles/Datastore-Errors.) This explains the error you received, as you cannot simply push a table to the data store; instead, you must first encode a table's data into a string using JSONEncode.
While I agree with much of Crabway's answer, I believe the function createBrick would not behave as intended. Consider the following trivial example:
httpService = game:GetService("HttpService")
t = {
hello = 1,
goodbye = 2
}
s = httpService:JSONEncode(t)
print(s)
> {"goodbye":2,"hello":1}
u = httpService:JSONDecode(s)
for k, v in pairs(u) do print(k, v) end
> hello 1
> goodbye 2
As you can see, the table returned by JSONDecode, like the original, uses strings as keys rather than numeric indices. Therefore, createBrick should be written something like this:
function createBrick(t)
local brick = Instance.new("Part")
brick.Size = t.Size
brick.Position = t.Position
brick.BrickColor = t.BrickColor
brick.Material = t.Material
-- FIXME: set any other necessary properties.
-- NOTE: try to set parent last for optimization reasons.
brick.Parent = t.Parent
return brick
end
As for encoding a model, calling GetChildren would produce a table of the model's children, which you could then loop through and encode the properties of everything within. Note that in Crabway's answer, he only accounts for Parts and WedgeParts. You should account for all parts using object:IsA("BasePart") and also check for unions with object:IsA("UnionOperation"). The following is a very basic example in which I do not store the encoded data; rather, I am just trying to show how to check the necessary cases.
function encodeModel(model)
local children = model:GetChildren()
for _, child in ipairs(children) do
if ((child:IsA("BasePart")) or (child:IsA("UnionOperation"))) then
-- FIXME: encode child
else if (child:IsA("Model")) then
-- FIXME: using recursion, loop through the sub-model's children.
end
end
return
end
For userdata, such as Vector3s or BrickColors, you will probably want to convert those to strings when you go to encode them with JSONEncode.
-- Example: part with "Brick red" BrickColor.
color = tostring(part.BrickColor)
print(string.format("%q", color))
> "Bright red"
I suggest what #Crabway said, use HttpService.
local httpService = game:GetService("HttpService")
print(httpService:JSONEncode({a = "b", b = "c"}) -- {"a":"b","b":"c"}
But if you have any UserData values such as Vector3s, CFrames, Color3s, BrickColors and Enum items, then use this library by Defaultio. It's actually pretty nice.
local library = require(workspace:WaitForChild("JSONWithUserdata"))
library:Encode({Vector3.new(0, 0, 0)})
If you want a little documentation, then look at the first comment in the script:
-- Defaultio
--[[
This module adds support for encoding userdata values to JSON strings.
It also supports lists which skip indices, such as {[1] = "a", [2] = "b", [4] = "c"}
Userdata support is implemented by replacing userdata types with a new table, with keys _T and _V:
_T = userdata type enum (index in the supportedUserdataTypes list)
_V = a value or table representing the value
Follow the examples bellow to add suppport for additional userdata types.
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
Usage example:
local myTable = {CFrame.new(), BrickColor.Random(), 4, "String", Enum.Material.CorrodedMetal}
local jsonModule = require(PATH_TO_MODULE)
local jsonString = jsonModule:Encode(myTable)
local decodedTable = jsonModule:Decode(jsonString)
--]]

SparkSQL performance issue with collect method

We are currently facing a performance issue in sparksql written in scala language. Application flow is mentioned below.
Spark application reads a text file from input hdfs directory
Creates a data frame on top of the file using programmatically specifying schema. This dataframe will be an exact replication of the input file kept in memory. Will have around 18 columns in the dataframe
var eqpDF = sqlContext.createDataFrame(eqpRowRdd, eqpSchema)
Creates a filtered dataframe from the first data frame constructed in step 2. This dataframe will contain unique account numbers with the help of distinct keyword.
var distAccNrsDF = eqpDF.select("accountnumber").distinct().collect()
Using the two dataframes constructed in step 2 & 3, we will get all the records which belong to one account number and do some Json parsing logic on top of the filtered data.
var filtrEqpDF =
eqpDF.where("accountnumber='" + data.getString(0) + "'").collect()
Finally the json parsed data will be put into Hbase table
Here we are facing performance issues while calling the collect method on top of the data frames. Because collect will fetch all the data into a single node and then do the processing, thus losing the parallel processing benefit.
Also in real scenario there will be 10 billion records of data which we can expect. Hence collecting all those records in to driver node will might crash the program itself due to memory or disk space limitations.
I don't think the take method can be used in our case which will fetch limited number of records at a time. We have to get all the unique account numbers from the whole data and hence I am not sure whether take method, which takes
limited records at a time, will suit our requirements
Appreciate any help to avoid calling collect methods and have some other best practises to follow. Code snippets/suggestions/git links will be very helpful if anyone have had faced similar issues
Code snippet
val eqpSchemaString = "acoountnumber ....."
val eqpSchema = StructType(eqpSchemaString.split(" ").map(fieldName =>
StructField(fieldName, StringType, true)));
val eqpRdd = sc.textFile(inputPath)
val eqpRowRdd = eqpRdd.map(_.split(",")).map(eqpRow => Row(eqpRow(0).trim, eqpRow(1).trim, ....)
var eqpDF = sqlContext.createDataFrame(eqpRowRdd, eqpSchema);
var distAccNrsDF = eqpDF.select("accountnumber").distinct().collect()
distAccNrsDF.foreach { data =>
var filtrEqpDF = eqpDF.where("accountnumber='" + data.getString(0) + "'").collect()
var result = new JSONObject()
result.put("jsonSchemaVersion", "1.0")
val firstRowAcc = filtrEqpDF(0)
//Json parsing logic
{
.....
.....
}
}
The approach usually take in this kind of situation is:
Instead of collect, invoke foreachPartition: foreachPartition applies a function to each partition (represented by an Iterator[Row]) of the underlying DataFrame separately (the partition being the atomic unit of parallelism of Spark)
the function will open a connection to HBase (thus making it one per partition) and send all the contained values through this connection
This means the every executor opens a connection (which is not serializable but lives within the boundaries of the function, thus not needing to be sent across the network) and independently sends its contents to HBase, without any need to collect all data on the driver (or any one node, for that matter).
It looks like you are reading a CSV file, so probably something like the following will do the trick:
spark.read.csv(inputPath). // Using DataFrameReader but your way works too
foreachPartition { rows =>
val conn = ??? // Create HBase connection
for (row <- rows) { // Loop over the iterator
val data = parseJson(row) // Your parsing logic
??? // Use 'conn' to save 'data'
}
}
You can ignore collect in your code if you have large set of data.
Collect Return all the elements of the dataset as an array at the driver program. This is usually useful after a filter or other operation that returns a sufficiently small subset of the data.
Also this can cause the driver to run out of memory, though, because collect() fetches the entire RDD/DF to a single machine.
I have just edited your code, which should work for you.
var distAccNrsDF = eqpDF.select("accountnumber").distinct()
distAccNrsDF.foreach { data =>
var filtrEqpDF = eqpDF.where("accountnumber='" + data.getString(0) + "'")
var result = new JSONObject()
result.put("jsonSchemaVersion", "1.0")
val firstRowAcc = filtrEqpDF(0)
//Json parsing logic
{
.....
.....
}
}

Neo4j: Converting REST call output to JSON

I have a requirement to convert the output of cypher into JSON.
Here is my code snippet.
RestCypherQueryEngine rcqer=new RestCypherQueryEngine(restapi);
String nodeN = "MATCH n=(Company) WITH COLLECT(n) AS paths RETURN EXTRACT(k IN paths | LAST(nodes(k))) as lastNode";
final QueryResult<Map<String,Object>> queryResult = rcqer.query(searchQuery);
for(Map<String,Object> row:queryResult)
{
System.out.println((ArrayList)row.get("lastNode"));
}
Output:
[http://XXX.YY6.192.103:7474/db/data/node/445, http://XXX.YY6.192.103:7474/db/data/node/446, http://XXX.YY6.192.103:7474/db/data/node/447, http://XXX.YY6.192.103:7474/db/data/node/448, http://XXX.YY6.192.103:7474/db/data/node/449, http://XXX.YY6.192.103:7474/db/data/node/450, http://XXX.YY6.192.103:7474/db/data/node/451, http://XXX.YY6.192.103:7474/db/data/node/452, http://XXX.YY6.192.103:7474/db/data/node/453]
I am not able to see the actual data (I am getting URL's). I am pretty sure I am missing something here.
I would also like to convert the output to JSON.
The cypher works in my browser interface.
I looked at various articles around this:
Java neo4j, REST and memory
Neo4j Cypher: How to iterate over ExecutionResult result
Converting ExecutionResult object to json
The last 2 make use of EmbeddedDatabase which may not be possible in my scenario (as the Neo is hosted in another cloud, hence the usage of REST).
Thanks.
Try to understand what you're doing? Your query does not make sense at all.
Perhaps you should re-visit the online course for Cypher: http://neo4j.com/online-course
MATCH n=(Company) WITH COLLECT(n) AS paths RETURN EXTRACT(k IN paths | LAST(nodes(k))) as lastNode
you can just do:
MATCH (c:Company) RETURN c
RestCypherQueryEngine rcqer=new RestCypherQueryEngine(restapi);
final QueryResult<Map<String,Object>> queryResult = rcqer.query(query);
for(Node node : queryResult.to(Node.class))
{
for (String prop : node.getPropertyKeys()) {
System.out.println(prop+" "+node.getProperty(prop));
}
}
I think it's better to use the JDBC driver for what you try to do, and also actually return the properties you're trying to convert to JSON.

Getting line locations with iText

How can one find where are lines located in a document with iText?
Suppose say I have a table in a PDF document, and want to read its contents; I would like to find where exactly the cells are located. In order to do that I thought I might find the intersections of lines.
I think your only option using iText will be to parse the PDF tokens manually. Before doing that I would have a copy of the PDF spec handy.
(I'm a .Net guy so I use iTextSharp but other than some capitalization differences and property declarations they're almost 100% the same.)
You can get the individual tokens using the PRTokeniser object which you feed bytes into from calling getPageContent(pageNum) on your PdfReader.
//Get bytes for page 1
byte[] pageBytes = reader.getPageContent(1);
//Get the tokens for page 1
PRTokeniser tokeniser = new PRTokeniser(pageBytes);
Then just loop through the PRTokeniser:
PRTokeniser.TokType tokenType;
string tokenValue;
while (tokeniser.nextToken()) {
tokenType = tokeniser.tokenType;
tokenValue = tokeniser.stringValue;
//...check tokenValue, do something with it
}
As far a tokenValue, you'd want to probably look for re and l values for rectangle and line. If you see an re then you want to look at the previous 4 values and if you see an l then previous 2 values. This also means that you need to store each tokenValue in an array so you can look back later.
Depending on what you used to create the PDF with you might get some interesting results. For instance, I created a 4 cell table with Microsoft Word and saved as a PDF. For some reason there are two sets of 10 rectangles with many duplicates, but the general idea still works.
Below is C# code targeting iTextSharp 5.1.1.0. You should be able to convert it to Java and iText very easily, I noted the one line that has .Net-specific code that needs to be adjusted from a Generic List (List<string>) to a Java equivalent, probably an ArrayList. You'll also need to adjust some casing, .Net uses Object.Method() whereas Java uses Object.method(). Lastly, .Net accesses properties without gets and sets, so Object.Property is both the getter and setter compared to Java's Object.getProperty and Object.setProperty.
Hopefully this gets you started at least!
//Source file to read from
string sourceFile = "c:\\Hello.pdf";
//Bind a reader to our PDF
PdfReader reader = new PdfReader(sourceFile);
//Create our buffer for previous token values. For Java users, List<string> is a generic list, probably most similar to an ArrayList
List<string> buf = new List<string>();
//Get the raw bytes for the page
byte[] pageBytes = reader.GetPageContent(1);
//Get the raw tokens from the bytes
PRTokeniser tokeniser = new PRTokeniser(pageBytes);
//Create some variables to set later
PRTokeniser.TokType tokenType;
string tokenValue;
//Loop through each token
while (tokeniser.NextToken()) {
//Get the types and value
tokenType = tokeniser.TokenType;
tokenValue = tokeniser.StringValue;
//If the type is a numeric type
if (tokenType == PRTokeniser.TokType.NUMBER) {
//Store it in our buffer for later user
buf.Add(tokenValue);
//Otherwise we only care about raw commands which are categorized as "OTHER"
} else if (tokenType == PRTokeniser.TokType.OTHER) {
//Look for a rectangle token
if (tokenValue == "re") {
//Sanity check, make sure we have enough items in the buffer
if (buf.Count < 4) throw new Exception("Not enough elements in buffer for a rectangle");
//Read and convert the values
float x = float.Parse(buf[buf.Count - 4]);
float y = float.Parse(buf[buf.Count - 3]);
float w = float.Parse(buf[buf.Count - 2]);
float h = float.Parse(buf[buf.Count - 1]);
//..do something with them here
}
}
}