I have the following XML which I am trying to convert into a data frame in R or a table within SQL:
<Root> <Product> <Test_ID value="1" /> <Effective_Date value="2022-01-01" /> <Membership value="Yes" /> <Request> <Request_ID value="1" /> <Request_type value="Simple" /> </Request> <Request> <Request_ID value="2" /> <Request_type value="Complex" /> </Request> </Product> </Root>
I’ve tried using XMLtoDataFrame (“xml” library) but this just creates a table without any values. Would anyone be able to help with this please? Open to a SQL solution.
Edit: I’m looking for my table to be in the following format table
Advertisement
Answer
Here’s an effort.
Assumptions:
Root
is always there, and always just oneProduct
can contain zero or more, so we’ll iterate smartly and combine- I do not assume that
$Product$Requests
are always perfectly aligned; it can be simplified (hard-coded fewer lines of code) if this is a guaranteed format.
Some helper functions:
func1 <- function(z) if (is.null(names(z))) attr(z, "value") else lapply(z, func1) merge.list <- function(A, B) { # normalize lengths, just in case, since I think you have more than one $Product A <- lapply(A, `length<-`, max(lengths(A))) B <- lapply(B, `length<-`, max(lengths(B))) BnotA <- setdiff(names(B), names(A)) AnotB <- setdiff(names(A), names(B)) inboth <- intersect(names(A), names(B)) A[BnotA] <- replicate(length(BnotA), rep(NA, max(lengths(A))), simplify = FALSE) A[AnotB] <- lapply(A[AnotB], function(z) c(z, rep(NA, max(lengths(B))))) A[inboth] <- Map(c, A[inboth], B[inboth]) A }
Code:
# library(xml2) # read_xml, as_list L <- xml2::as_list(xml2::read_xml(' <Root> <Product> <Test_ID value="1" /> <Effective_Date value="2022-01-01" /> <Membership value="Yes" /> <Request> <Request_ID value="1" /> <Request_type value="Simple" /> </Request> <Request> <Request_ID value="2" /> <Request_type value="Complex" /> </Request> </Product> </Root>')) out <- do.call(rbind.data.frame, lapply(func1(L$Root), function(pr) { as.data.frame(lapply(split(pr, names(pr)), function(Y) Reduce(merge.list, Y))) })) out # Effective_Date Membership Request.Request_ID Request.Request_type Test_ID # Product.1 2022-01-01 Yes 1 Simple 1 # Product.2 2022-01-01 Yes 2 Complex 1
(It’s a “clean” data.frame
🙂
str(out) # 'data.frame': 2 obs. of 5 variables: # $ Effective_Date : chr "2022-01-01" "2022-01-01" # $ Membership : chr "Yes" "Yes" # $ Request.Request_ID : chr "1" "2" # $ Request.Request_type: chr "Simple" "Complex" # $ Test_ID : chr "1" "1"