PySpark Formula¶
Formula¶
The formula module contains code to extract values from a record (e.g. a Spark dataframe Record) based on the model definition.
-
class
ydot.formula.CatExtractor(record, term)¶ Bases:
ydot.formula.ExtractorCategorical extractor (no levels).
-
__init__(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value¶ Gets the extracted value.
-
-
class
ydot.formula.ConExtractor(record, term)¶ Bases:
ydot.formula.ExtractorContinuous extractor (no functions).
-
__init__(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value¶ Gets the extracted value.
-
-
class
ydot.formula.Extractor(record, term, term_type)¶ Bases:
abc.ABCExtractor to get value based on model term.
-
__init__(record, term, term_type)¶ ctor.
- Param
Dictionary.
- Term
Model term.
- Term_type
Type of term.
- Returns
None
-
abstract property
value¶ Gets the extracted value.
-
-
class
ydot.formula.FunExtractor(record, term)¶ Bases:
ydot.formula.ExtractorContinuous extractor (with functions defined).
-
__init__(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value¶ Gets the extracted value.
-
-
class
ydot.formula.IntExtractor(record, term)¶ Bases:
ydot.formula.ExtractorIntercept extractor. Always returns 1.0.
-
__init__(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value¶ Gets the extracted value.
-
-
class
ydot.formula.InteractionExtractor(record, terms)¶ Bases:
objectInteraction extractor for interaction effects.
-
__init__(record, terms)¶ ctor.
- Parameters
record – Dictionary.
terms – Model term (possibly with interaction effects).
- Returns
None.
-
property
value¶
-
-
class
ydot.formula.LvlExtractor(record, term)¶ Bases:
ydot.formula.ExtractorCategorical extractor (with levels).
-
__init__(record, term)¶ ctor.
- Parameters
record – Dictionary.
term – Model term.
- Returns
None.
-
property
value¶ Gets the extracted value.
-
-
class
ydot.formula.TermEnum(value)¶ Bases:
enum.IntEnumTerm types.
CAT: categorical without levels specified
LVL: categorical with levels specified
CON: continuous
FUN: continuous with function transformations
INT: intercept
-
CAT= 1¶
-
CON= 3¶
-
FUN= 4¶
-
INT= 5¶
-
LVL= 2¶
-
static
get_extractor(record, term)¶ Gets the associated extractor based on the specified term.
- Parameters
record – Dictionary.
term – Model term.
- Returns
Extractor.
Spark¶
The spark module contains code to transform a Spark dataframe into design matrices as specified by a formula.
-
ydot.spark.get_columns(formula, sdf, profile=None)¶ Gets the expanded columns of the specified Spark dataframe using the specified formula.
- Parameters
formula – Formula (R-like, based on patsy).
sdf – Spark dataframe.
profile – Profile. Default is None and profile will be determined empirically.
- Returns
Tuple of columns for y, X.
-
ydot.spark.get_profile(sdf)¶ Gets the field profiles of the specified Spark dataframe.
- Parameters
sdf – Spark dataframe.
- Returns
Dictionary.
-
ydot.spark.smatrices(formula, sdf, profile=None)¶ Gets tuple of design/model matrices.
- Parameters
formula – Formula.
sdf – Spark dataframe.
profile – Dictionary of data profile.
- Returns
y, X Spark dataframes.