Aggregations
Start by importing:
import scalaql._
// Docs classes
import scalaql.docs.Hogwarts._
Basic groupByβ
You could define aggregations for your data as you do with plain SQL
.
To evaluate such kind of aggregation:
case class FacultyInfo(
name: String,
avgAge: Double,
totalGrade: Double,
specializations: List[String])
First, start with groupBy
:
val grouped = select[Student].groupBy(_.faculty)
// grouped: Query.GroupByQuery[From[Student], Student, String] = FROM(Hogwarts::Student) -> GROUP BY(String)
Then you could aggregate this query:
val facultyInfos = grouped
.aggregate((faculty, students) =>
(
students.avgBy(_.age.toDouble) &&
students.sumBy(_.grade) &&
students.toListBy(_.specialization)
).map {
case (avgAge, totalGrade, specializations) =>
FacultyInfo(faculty, avgAge, totalGrade, specializations)
}
)
// facultyInfos: Query[From[Student], FacultyInfo] = FROM(Hogwarts::Student) -> GROUP BY(String) -> AGGREGATE(MdocSession::App::FacultyInfo)
aggregate
function provides you the grouping key (in this case - faculty
) and a special value to declare your
aggregations.
In this example:
- To calculate average age - use
students.avgBy(_.age.toDouble)
- To get total grade - use
students.sumBy(_.grade)
- To collect each specialization of the faculty - use
students.toListBy(_.specialization)
You could chain multiple aggregations by &&
operator. It will automatically flatten the results into a single tuple.
Then it's possible to map
the aggregation result.
This is the result of running such query:
facultyInfos
.show(truncate=false)
.run(
from(students)
)
// +----------+------------------+----------+-----------------------------------------+
// |name |avgAge |totalGrade|specializations |
// +----------+------------------+----------+-----------------------------------------+
// |Gryffindor|18.333333333333332|250.9 |[getting into troubles, eating, learning]|
// |Slytherin |18.0 |85.1 |[trolling] |
// |Hufflepuff|17.0 |90.1 |[young dying] |
// +----------+------------------+----------+-----------------------------------------+
//
Reportsβ
report
aggregation function is somehow similar to groupBy
with multiple keys.
The difference is that report
allows you to do aggregations on intermediate results by using combine
method.
First, start by defining report case class:
case class BySpec(specialization: String, avgGrade: Double)
case class ByAge(age: Int, grades: List[BySpec])
case class ByFaculty(faculty: String, grades: List[ByAge])
Then, you could define the aggreagtion as follows:
val byFacultyReportQuery: Query[From[Student], ByFaculty] =
select[Student]
.groupBy(_.faculty)
.aggregate { case (faculty, students) =>
students
.report(_.age, _.specialization) { (age, spec, students) =>
// We know Dumbledore too well =)
val additionalPoints = if (faculty == Gryffindor.name) 100 else 0
students.avgBy(_.grade + additionalPoints)
.map(BySpec(spec, _))
}
.combine((age, cascadeInfos) => cascadeInfos.toList.map(ByAge(age, _)))
.map(byAge => ByFaculty(faculty, byAge))
}
// byFacultyReportQuery: Query[From[Student], ByFaculty] = FROM(Hogwarts::Student) -> GROUP BY(String) -> AGGREGATE(MdocSession::App::ByFaculty)
In this case, you define your aggregation as always. Notice the following:
- Usage of
report
aggregation function report
accepts grouping keys (likeage
andspecialization
)- First, you define aggregations per all grouping keys (both
age
andspecialization
) - Then, using
combine
, you aggregate onage
grouping key level - Inside
combine
you'll use a deeper level aggregation result - ... and so on until you combine all nested aggregations
Then run the query:
byFacultyReportQuery
.show(truncate = false)
.run(from(students))
// +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
// |faculty |grades |
// +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
// |Gryffindor|[{age: 18, grades: [{specialization: eating, avgGrade: 166.2}, {specialization: learning, avgGrade: 199.6}]}, {age: 19, grades: [{specialization: getting into troubles, avgGrade: 185.1}]}]|
// |Slytherin |[{age: 18, grades: [{specialization: trolling, avgGrade: 85.1}]}] |
// |Hufflepuff|[{age: 17, grades: [{specialization: young dying, avgGrade: 90.1}]}] |
// +----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
//
Available aggregation functionsβ
toList
- collects records into aList
distinct
- collects distinct records into aSet
distinctBy
- collects distinct records, with specific field to distinguish byflatDistinctBy
- same asdistinctBy
, but for a collection of fieldsconst
- return a constant value as an aggregation resultsum
- sum the fieldsumBy
- sum the field by some other valueproduct
- multiplication of field valuesproductBy
- multiplication of field values by some other valueavg
- average value of the fieldavgBy
- average value of the field by some other valuestd
- standard deviation of the fieldstdBy
- standard deviation of the field by some other valuecount
- number of values for each given predicate holdssize
- total number of valuesreport(K1, K2, ...)
- report of arity N