Describe what is an algebra? [closed] - scala

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
In the functional programming world, when I want to design an API, i will encounter the word algebra api.
Could someone please describe, what an algebra is in FP in context of designing API.
Which components build an algebra api? Laws, operations, etc..?
There is a word primitive, what is a primitive exactly? Please with show me an example.

I think what you are referring to is algebraic data types.
Product Type
A common class of ADT is the product type. As an example, a "user" can be described as a combination of "name", "email address", and "age":
case class User(name : String, email : String, age : Int)
This is called a "product" type because we can count the number of possible distinct Users using multiplication:
distinct user count = (distinct name count) x (distinct email count) x (distinct age count)
Sum Type
The other common ADT class is the sum type. As an example, a user can either be a common user or an adminstrator:
sealed trait User
case class CommonUser(name : String) extends User
case class AdminUser(name : String, powers : Set[AdminPowers]) extends User
This is called a "sum" type because we can count the number of possible distinct Users using addition:
distinct user count = (distinct common user count) + (distinct admin user count)

Related

How can I use regex for filtering CloudWatch metrics with Grafana? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I am using Grafana ver: 8.4.5 and configured CloudWatch as datasource for Grafana.
I am using the Grafana explore console for free querying and trying to filter the AWS/SNS topic names that contains the word 'errors'
I am using this syntax:
{SELECT SUM(NumberOfMessagesPublished) FROM "AWS/SNS" WHERE TopicName = '/error/'
But the returned value is an error:
'metric request error: "ValidationError: Error in expression 'querya40c2687332045be81b72e2637446bf7': Invalid syntax\n\tstatus code: 400, request id: 159bd510-bfde-449a-b637-e39a6094dd10"'
Is is even possible to use Regex for monitoring few topics at the same query? if so, can someone please assist with the syntax?
thanks in advance
It looks like you are trying to use CloudWatch Metrics Insights query, so see doc first:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch-metrics-insights-querylanguage.html
The WHERE clause supports the following operators:
= Label value must match the specified string.
!= Label value must not match the specified string.
AND Both conditions that are specified must be true to match. You can use multiple AND keywords to specify two or more conditions.
So unfortunately, regexp matching is not supported in a way you are expecting.

SQL - Best practice for handling mass arbitrary data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a massive delimited file and many normalized tables to input the data. Is there a best practice for bringing in the data and inserting the data into its proper fields and tables?
For instance, right now I've created a temp table that holds all the arbitrary data. Some logic runs against each row to determine what values will be going in to what table. Without too much specifics the part that concerns me looks something like:
INSERT INTO table VALUES (
(SELECT TOP 1 field1 FROM #tmpTable),
(SELECT TOP 1 field30 FROM #tmpTable),
(SELECT TOP 1 field2 FROM #tmpTable),
...
(SELECT TOP 1 field4 FROM #tmpTable))
With that, my questions are: Is it reasonable to be using a temp table for this purpose? And is it poor practice to use these SELECT statements so liberally like this? It feels sort of hacky, are there a better ways to handle mass data importing and separation like this?
You should try SSIS.
SSIS How to Create an ETL Package

Filtering with Scala and Apache Spark [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have created a unlabeled Dataset which has some columns. The values in one of the Column are France,Germany,France and UK
I know how to filter and count using below code.
val b =data.filter(_.contains("France")).count
However, I am not sure how to count values other than France.
I tried below code but it is giving me wrong result
val a =data.filter(x=>x!="France").count
PS: My question is a bit similar to Is there a way to filter a field not containing something in a spark dataframe using scala? but I am looking for some simpler answer.
You are trying to filter those elements which is equal to "France".
Try this
val a=data.filter(!_.contains("France")).count
To cricket_007 's point, should be something like this
val myDSCount = data.filter(row => row._1 != "France").count()
I am not sure what column your data is in, so the row._1 would change to the correct number. You can run the following to see all of your columns:
data.printSchema

How to construct an RDD or DataFrame dynamically? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing some pre-processing on a bunch of data. Each line has the following schema
<row Att1="...." Att2="..." Attn"...." />
However not all the attributes exist in all the rows. That is some rows might have only three attributes while some other have five, etc. Besides, there is no way attribute indicating how many attribute exist within each row.
I would like to form an RDD or DataFrame (prefrable) and run some query on the data. However I can't find a good way of splitting each row. For example, splitting by space not work. I only need a few attributes in my processing. I tried to use pattern matching to extract 4 attributes that exist in all the row as follows but it fails.
val pattern = "Att1=(.*) Att3=(.*) Att10=(.*) Att11=(.*)".r
val rdd1 = sc.textFile("file.xml")
val rdd2 = rdd1.map {line => line match {
case pattern(att1,att2,att3,att4) => Post(att1,att2,att3,att4)
}
}
case class Post(Att1: String, Att3: String, Att10: String, Att11: String)
p.s. I'm using scala.
This is less of a spark problem than it is a scala problem. Is the data stored across multiple files?
I would recommend parallelizing by file and then parsing row by row.
For the parsing I would:
Create a case class of what you want the rows to look like (This will allow the schema to be inferred using reflection when creating the DF)
Create a list of name/regex tuples for the parsing like: ("Attribute", regex)
Map over the list of regex and convert to a map: (Attribute -> Option[Value])
Create the case class objects
This should lead to a data structure of List[CaseClass] or RDD[CaseClass] which can be converted to a dataframe. You may need to do additional processing to filter out un-needed rows and to remove the Options.

slick & scala : What are TableQueries? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am a bit disappointed with slick & its TableQueries : the model of an application can be a "class Persons(tag: Tag) extends Table[Person] for example (where Person is a case class with some fields like name, age,address...).
The weird point is the "val persons = TableQuery[Persons]" contains all the records.
To have for example all the adults, we can use:
adults = persons.filter(p => p.age >= 18).list()
Is the content of the database loaded in the variable persons?
Is there on the contrary a mechanism that allows to evaluate not "persons" but "adults"?(a sort of lazy variable)?
Can we say something like 'at any time, "persons" contains the entire database'?
Are there good practices, some important ideas that can help the developer?
thanks.
You are mistaken in your assumption that persons contains all of the records. The Table and TableQuery classes are representations of a SQL table, and the whole point of the library is to ease the interaction with SQL databases by providing a convenient, scala-like syntax.
When you say
val adults = persons.filter{ p => p.age >= 18 }
You've essentially created a SQL query that you can think of as
SELECT * FROM PERSONS WHERE AGE >= 18
Then when you call .list() it executes that query, transforming the result rows from the database back into instances of your Person case class. Most of the methods that have anything to do with slick's Table or Query classes will be focused on generating Queries (i.e. "select" statements). They don't actually load any data until you invoke them (e.g. by calling .list() or .foreach).
As for good practices and important ideas, I'd suggest you read through their documentation, as well as take a look at the scaladocs for any of the classes you are curious about.
http://slick.typesafe.com/docs/