A model that is trained in any language are able to integrate with
tidypredict, and thus with broom. The
requirement is that the model in that language is exported using the
parse model spec. The easiest file format would be YAML.
A model that was fitted using sklearn’s
linear_model. The model is based on diabetes data. Ten
baseline variables, age, sex, body mass index, average blood pressure,
and six blood serum measurements were obtained for each of n = 442
diabetes patients, as well as the response of interest, a quantitative
measure of disease progression one year after baseline. The model’s
results were converted to YAML by the same python script, I copied and
pasted the top part here:
general:
is_glm: 0
model: lm
residual: 0
sigma2: 0
type: regression
version: 2.0
terms:
- coef: 152.76430691633442
fields:
- col: (Intercept)
type: ordinary
is_intercept: 1
label: (Intercept)
The YAML data can be read in R by using the yaml
package. In this example, we have copy-pasted most of the models inside
a variable called sklearn_model. Because yaml
requires local YAML variables to be split by line, we use
strsplit().
library(yaml)
sklearn_model <- strsplit("general:
is_glm: 0
model: lm
residual: 0
sigma2: 0
type: regression
version: 2.0
terms:
- coef: 152.76430691633442
fields:
- col: (Intercept)
type: ordinary
is_intercept: 1
label: (Intercept)
- coef: 0.3034995490660432
fields:
- col: age
type: ordinary
is_intercept: 0
label: age
- coef: -237.63931533353403
fields:
- col: sex
type: ordinary
is_intercept: 0
label: sex
- coef: 510.5306054362253
fields:
- col: bmi
type: ordinary
is_intercept: 0
label: bmi
- coef: 327.7369804093466
fields:
- col: bp
type: ordinary
is_intercept: 0
label: bp
- coef: -814.1317093725387
fields:
- col: s1
type: ordinary
is_intercept: 0
label: s1
", split = "\n")[[1]]Now the model is converted to an R list using
yaml.load.
sklearn_model <- yaml.load(sklearn_model)
str(sklearn_model, 2)
#> List of 2
#> $ general:List of 6
#> ..$ is_glm : int 0
#> ..$ model : chr "lm"
#> ..$ residual: int 0
#> ..$ sigma2 : int 0
#> ..$ type : chr "regression"
#> ..$ version : num 2
#> $ terms :List of 6
#> ..$ :List of 4
#> ..$ :List of 4
#> ..$ :List of 4
#> ..$ :List of 4
#> ..$ :List of 4
#> ..$ :List of 4tidypredictThe list object needs to be recognized as a
tidypredict parsed model. To do that, we use
as_parsed_model()
library(tidypredict)
spm <- as_parsed_model(sklearn_model)
class(spm)
#> [1] "parsed_model" "pm_regression" "list"The spm variable now works just as any parsed model
inside R. Use tidypredict_fit() to view the resulting
formula.
tidypredict_fit(spm)
#> 152.764306916334 + (age * 0.303499549066043) + (sex * -237.639315333534) +
#> (bmi * 510.530605436225) + (bp * 327.736980409347) + (s1 *
#> -814.131709372539)Now, the model can run inside a database
tidypredict_sql(spm, dbplyr::simulate_mssql())
#> <SQL> ((((152.764306916334 + (`age` * 0.303499549066043)) + (`sex` * -237.639315333534)) + (`bmi` * 510.530605436225)) + (`bp` * 327.736980409347)) + (`s1` * -814.131709372539)broomNow that we have a parsed_model object, it is possible
to use broom’s tidy() function. This means
that we are able to integrate a totally external model, with
broom.
tidy(spm)
#> # A tibble: 6 × 2
#> term estimate
#> <chr> <dbl>
#> 1 (Intercept) 153.
#> 2 age 0.303
#> 3 sex -238.
#> 4 bmi 511.
#> 5 bp 328.
#> 6 s1 -814.