PySpark Formula¶
Formula¶
The formula
module contains code to extract values from a record (e.g. a Spark dataframe Record) based on the model definition.
-
class
ydot.formula.
CatExtractor
(record, term)¶ Bases:
ydot.formula.Extractor
Categorical extractor (no levels).
-
__init__
(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value
¶ Gets the extracted value.
-
-
class
ydot.formula.
ConExtractor
(record, term)¶ Bases:
ydot.formula.Extractor
Continuous extractor (no functions).
-
__init__
(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value
¶ Gets the extracted value.
-
-
class
ydot.formula.
Extractor
(record, term, term_type)¶ Bases:
abc.ABC
Extractor to get value based on model term.
-
__init__
(record, term, term_type)¶ ctor.
- Param
Dictionary.
- Term
Model term.
- Term_type
Type of term.
- Returns
None
-
abstract property
value
¶ Gets the extracted value.
-
-
class
ydot.formula.
FunExtractor
(record, term)¶ Bases:
ydot.formula.Extractor
Continuous extractor (with functions defined).
-
__init__
(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value
¶ Gets the extracted value.
-
-
class
ydot.formula.
IntExtractor
(record, term)¶ Bases:
ydot.formula.Extractor
Intercept extractor. Always returns 1.0.
-
__init__
(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value
¶ Gets the extracted value.
-
-
class
ydot.formula.
InteractionExtractor
(record, terms)¶ Bases:
object
Interaction extractor for interaction effects.
-
__init__
(record, terms)¶ ctor.
- Parameters
record – Dictionary.
terms – Model term (possibly with interaction effects).
- Returns
None.
-
property
value
¶
-
-
class
ydot.formula.
LvlExtractor
(record, term)¶ Bases:
ydot.formula.Extractor
Categorical extractor (with levels).
-
__init__
(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value
¶ Gets the extracted value.
-
-
class
ydot.formula.
TermEnum
(value)¶ Bases:
enum.IntEnum
Term types.
CAT: categorical without levels specified
LVL: categorical with levels specified
CON: continuous
FUN: continuous with function transformations
INT: intercept
-
CAT
= 1¶
-
CON
= 3¶
-
FUN
= 4¶
-
INT
= 5¶
-
LVL
= 2¶
-
static
get_extractor
(record, term)¶ Gets the associated extractor based on the specified term.
- Parameters
record – Dictionary.
term – Model term.
- Returns
Extractor.
Spark¶
The spark
module contains code to transform a Spark dataframe into design matrices
as specified by a formula.
-
ydot.spark.
get_columns
(formula, sdf, profile=None)¶ Gets the expanded columns of the specified Spark dataframe using the specified formula.
- Parameters
formula – Formula (R-like, based on patsy).
sdf – Spark dataframe.
profile – Profile. Default is None and profile will be determined empirically.
- Returns
Tuple of columns for y, X.
-
ydot.spark.
get_profile
(sdf)¶ Gets the field profiles of the specified Spark dataframe.
- Parameters
sdf – Spark dataframe.
- Returns
Dictionary.
-
ydot.spark.
smatrices
(formula, sdf, profile=None)¶ Gets tuple of design/model matrices.
- Parameters
formula – Formula.
sdf – Spark dataframe.
profile – Dictionary of data profile.
- Returns
y, X Spark dataframes.