PySpark Formula

Formula

The formula module contains code to extract values from a record (e.g. a Spark dataframe Record) based on the model definition.

class ydot.formula.CatExtractor(record, term)

Bases: ydot.formula.Extractor

Categorical extractor (no levels).

__init__(record, term)

ctor.

Parameters
  • record – Dictionary.

  • term – Model term.

Returns

None.

property value

Gets the extracted value.

class ydot.formula.ConExtractor(record, term)

Bases: ydot.formula.Extractor

Continuous extractor (no functions).

__init__(record, term)

ctor.

Parameters
  • record – Dictionary.

  • term – Model term.

Returns

None.

property value

Gets the extracted value.

class ydot.formula.Extractor(record, term, term_type)

Bases: abc.ABC

Extractor to get value based on model term.

__init__(record, term, term_type)

ctor.

Param

Dictionary.

Term

Model term.

Term_type

Type of term.

Returns

None

abstract property value

Gets the extracted value.

class ydot.formula.FunExtractor(record, term)

Bases: ydot.formula.Extractor

Continuous extractor (with functions defined).

__init__(record, term)

ctor.

Parameters
  • record – Dictionary.

  • term – Model term.

Returns

None.

property value

Gets the extracted value.

class ydot.formula.IntExtractor(record, term)

Bases: ydot.formula.Extractor

Intercept extractor. Always returns 1.0.

__init__(record, term)

ctor.

Parameters
  • record – Dictionary.

  • term – Model term.

Returns

None.

property value

Gets the extracted value.

class ydot.formula.InteractionExtractor(record, terms)

Bases: object

Interaction extractor for interaction effects.

__init__(record, terms)

ctor.

Parameters
  • record – Dictionary.

  • terms – Model term (possibly with interaction effects).

Returns

None.

property value
class ydot.formula.LvlExtractor(record, term)

Bases: ydot.formula.Extractor

Categorical extractor (with levels).

__init__(record, term)

ctor.

Parameters
  • record – Dictionary.

  • term – Model term.

Returns

None.

property value

Gets the extracted value.

class ydot.formula.TermEnum(value)

Bases: enum.IntEnum

Term types.

  • CAT: categorical without levels specified

  • LVL: categorical with levels specified

  • CON: continuous

  • FUN: continuous with function transformations

  • INT: intercept

CAT = 1
CON = 3
FUN = 4
INT = 5
LVL = 2
static get_extractor(record, term)

Gets the associated extractor based on the specified term.

Parameters
  • record – Dictionary.

  • term – Model term.

Returns

Extractor.

Spark

The spark module contains code to transform a Spark dataframe into design matrices as specified by a formula.

ydot.spark.get_columns(formula, sdf, profile=None)

Gets the expanded columns of the specified Spark dataframe using the specified formula.

Parameters
  • formula – Formula (R-like, based on patsy).

  • sdf – Spark dataframe.

  • profile – Profile. Default is None and profile will be determined empirically.

Returns

Tuple of columns for y, X.

ydot.spark.get_profile(sdf)

Gets the field profiles of the specified Spark dataframe.

Parameters

sdf – Spark dataframe.

Returns

Dictionary.

ydot.spark.smatrices(formula, sdf, profile=None)

Gets tuple of design/model matrices.

Parameters
  • formula – Formula.

  • sdf – Spark dataframe.

  • profile – Dictionary of data profile.

Returns

y, X Spark dataframes.