R 列表到数据帧

I have a nested list of data. Its length is 132 and each item is a list of length 20. Is there a quick way to convert this structure into a data frame that has 132 rows and 20 columns of data?

Here is some sample data to work with:

l <- replicate(
  132,
  list(sample(letters, 20)),
  simplify = FALSE
)

转载于:https://stackoverflow.com/questions/4227223/r-list-to-data-frame

Assuming your list of lists is called l:

df <- data.frame(matrix(unlist(l), nrow=132, byrow=T))

The above will convert all character columns to factors, to avoid this you can add a parameter to the data.frame() call:

df <- data.frame(matrix(unlist(l), nrow=132, byrow=T),stringsAsFactors=FALSE)

With rbind

do.call(rbind.data.frame, your_list)

Edit: Previous version return data.frame of list's instead of vectors (as @IanSudbery pointed out in comments).

data.frame(t(sapply(mylistlist,c)))

sapply converts it to a matrix. data.frame converts the matrix to a data frame.

You can use the plyr package. For example a nested list of the form

l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
      , b = list(var.1 = 4, var.2 = 5, var.3 = 6)
      , c = list(var.1 = 7, var.2 = 8, var.3 = 9)
      , d = list(var.1 = 10, var.2 = 11, var.3 = 12)
      )

has now a length of 4 and each list in l contains another list of the length 3. Now you can run

  library (plyr)
  df <- ldply (l, data.frame)

and should get the same result as in the answer @Marek and @nico.

More answers, along with timings in the answer to this question: What is the most efficient way to cast a list as a data frame?

The quickest way, that doesn't produce a dataframe with lists rather than vectors for columns appears to be (from Martin Morgan's answer):

l <- list(list(col1="a",col2=1),list(col1="b",col2=2))
f = function(x) function(i) unlist(lapply(x, `[[`, i), use.names=FALSE)
as.data.frame(Map(f(l), names(l[[1]])))

The package data.table has the function rbindlist which is a superfast implementation of do.call(rbind, list(...)).

It can take a list of lists, data.frames or data.tables as input.

library(data.table)
ll <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
  , b = list(var.1 = 4, var.2 = 5, var.3 = 6)
  , c = list(var.1 = 7, var.2 = 8, var.3 = 9)
  , d = list(var.1 = 10, var.2 = 11, var.3 = 12)
  )

DT <- rbindlist(ll)

This returns a data.table inherits from data.frame.

If you really want to convert back to a data.frame use as.data.frame(DT)

Reshape2 yields the same output as the plyr example above:

library(reshape2)
l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
          , b = list(var.1 = 4, var.2 = 5, var.3 = 6)
          , c = list(var.1 = 7, var.2 = 8, var.3 = 9)
          , d = list(var.1 = 10, var.2 = 11, var.3 = 12)
)
l <- melt(l)
dcast(l, L1 ~ L2)

yields:

  L1 var.1 var.2 var.3
1  a     1     2     3
2  b     4     5     6
3  c     7     8     9
4  d    10    11    12

If you were almost out of pixels you could do this all in 1 line w/ recast().

assume your list is called L,

data.frame(Reduce(rbind, L))

Extending on @Marek's answer: if you want to avoid strings to be turned into factors and efficiency is not a concern try

do.call(rbind, lapply(your_list, data.frame, stringsAsFactors=FALSE))

This is what finally worked for me:

do.call("rbind", lapply(S1, as.data.frame))

l <- replicate(10,list(sample(letters, 20)))
a <-lapply(l[1:10],data.frame)
do.call("cbind", a)

Sometimes your data may be a list of lists of vectors of the same length.

lolov = list(list(c(1,2,3),c(4,5,6)), list(c(7,8,9),c(10,11,12),c(13,14,15)) )

(The inner vectors could also be lists, but I'm simplifying to make this easier to read).

Then you can make the following modification. Remember that you can unlist one level at a time:

lov = unlist(lolov, recursive = FALSE )
> lov
[[1]]
[1] 1 2 3

[[2]]
[1] 4 5 6

[[3]]
[1] 7 8 9

[[4]]
[1] 10 11 12

[[5]]
[1] 13 14 15

Now use your favorite method mentioned in the other answers:

library(plyr)
>ldply(lov)
  V1 V2 V3
1  1  2  3
2  4  5  6
3  7  8  9
4 10 11 12
5 13 14 15

For the general case of deeply nested lists with 3 or more levels like the ones obtained from a nested JSON:

{
"2015": {
  "spain": {"population": 43, "GNP": 9},
  "sweden": {"population": 7, "GNP": 6}},
"2016": {
  "spain": {"population": 45, "GNP": 10},
  "sweden": {"population": 9, "GNP": 8}}
}

consider the approach of melt() to convert the nested list to a tall format first:

myjson <- jsonlite:fromJSON(file("test.json"))
tall <- reshape2::melt(myjson)[, c("L1", "L2", "L3", "value")]
    L1     L2         L3 value
1 2015  spain population    43
2 2015  spain        GNP     9
3 2015 sweden population     7
4 2015 sweden        GNP     6
5 2016  spain population    45
6 2016  spain        GNP    10
7 2016 sweden population     9
8 2016 sweden        GNP     8

followed by dcast() then to wide again into a tidy dataset where each variable forms a a column and each observation forms a row:

wide <- reshape2::dcast(tall, L1+L2~L3) 
# left side of the formula defines the rows/observations and the 
# right side defines the variables/measurements
    L1     L2 GNP population
1 2015  spain   9         43
2 2015 sweden   6          7
3 2016  spain  10         45
4 2016 sweden   8          9

The tibble package has a function enframe() that solves this problem by coercing nested list objects to nested tibble ("tidy" data frame) objects. Here's a brief example from R for Data Science:

x <- list(
    a = 1:5,
    b = 3:4, 
    c = 5:6
) 

df <- enframe(x)
df
#> # A tibble: 3 × 2
#>    name     value
#>   <chr>    <list>
#>    1     a <int [5]>
#>    2     b <int [2]>
#>    3     c <int [2]>

Since you have several nests in your list, l, you can use the unlist(recursive = FALSE) to remove unnecessary nesting to get just a single hierarchical list and then pass to enframe(). I use tidyr::unnest() to unnest the output into a single level "tidy" data frame, which has your two columns (one for the group name and one for the observations with the groups value). If you want columns that make wide, you can add a column using add_column() that just repeats the order of the values 132 times. Then just spread() the values.


library(tidyverse)

l <- replicate(
    132,
    list(sample(letters, 20)),
    simplify = FALSE
)

l_tib <- l %>% 
    unlist(recursive = FALSE) %>% 
    enframe() %>% 
    unnest()
l_tib
#> # A tibble: 2,640 x 2
#>     name value
#>    <int> <chr>
#> 1      1     d
#> 2      1     z
#> 3      1     l
#> 4      1     b
#> 5      1     i
#> 6      1     j
#> 7      1     g
#> 8      1     w
#> 9      1     r
#> 10     1     p
#> # ... with 2,630 more rows

l_tib_spread <- l_tib %>%
    add_column(index = rep(1:20, 132)) %>%
    spread(key = index, value = value)
l_tib_spread
#> # A tibble: 132 x 21
#>     name   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`
#> *  <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1      1     d     z     l     b     i     j     g     w     r     p     y
#> 2      2     w     s     h     r     i     k     d     u     a     f     j
#> 3      3     r     v     q     s     m     u     j     p     f     a     i
#> 4      4     o     y     x     n     p     i     f     m     h     l     t
#> 5      5     p     w     v     d     k     a     l     r     j     q     n
#> 6      6     i     k     w     o     c     n     m     b     v     e     q
#> 7      7     c     d     m     i     u     o     e     z     v     g     p
#> 8      8     f     s     e     o     p     n     k     x     c     z     h
#> 9      9     d     g     o     h     x     i     c     y     t     f     j
#> 10    10     y     r     f     k     d     o     b     u     i     x     s
#> # ... with 122 more rows, and 9 more variables: `12` <chr>, `13` <chr>,
#> #   `14` <chr>, `15` <chr>, `16` <chr>, `17` <chr>, `18` <chr>,
#> #   `19` <chr>, `20` <chr>

test1 <- list( c(a='a',b='b',c='c'), c(a='d',b='e',c='f')) as.data.frame(test1) a b c 1 a b c 2 d e f

test2 <- list( c('a','b','c'), c(a='d',b='e',c='f'))

as.data.frame(test2) a b c 1 a b c 2 d e f

test3 <- list('Row1'=c(a='a',b='b',c='c'), 'Row2'=c(a='d',var2='e',var3='f'))

as.data.frame(test3) a b c var2 var3 Row1 a b c
Row2 d e f

This method uses a tidyverse package (purrr).

The list:

x <- as.list(mtcars)

Converting it into a data frame (a tibble more specifically):

library(purrr)
map_df(x, ~.x)

Depending on the structure of your lists there are some tidyverse options that work nicely with unequal length lists:

l <- list(a = list(var.1 = 1, var.2 = 2, var.3 = 3)
        , b = list(var.1 = 4, var.2 = 5)
        , c = list(var.1 = 7, var.3 = 9)
        , d = list(var.1 = 10, var.2 = 11, var.3 = NA))

df <- dplyr::bind_rows(l)
df <- purrr::map_df(l, dplyr::bind_rows)
df <- purrr::map_df(l, ~.x)

# all create the same data frame:
# A tibble: 4 x 3
  var.1 var.2 var.3
  <dbl> <dbl> <dbl>
1     1     2     3
2     4     5    NA
3     7    NA     9
4    10    11    NA

You can also mix vectors and data frames:

library(dplyr)
bind_rows(
  list(a = 1, b = 2),
  data_frame(a = 3:4, b = 5:6),
  c(a = 7)
)

# A tibble: 4 x 2
      a     b
  <dbl> <dbl>
1     1     2
2     3     5
3     4     6
4     7    NA