我有一个6列的R数据框架,我想要创建一个新的dataframe,其中只有三个列。
假设我的数据框架是: df
我想提取列:A
B
和E
这是我唯一能找到的命令:
data.frame(df$A,df$B,df$E)
有没有一种更紧凑的方法来做到这一点?
Using the dplyr package, if your data.frame is called df1
:
library(dplyr)
df1 %>%
select(A, B, E)
This can also be written without the %>%
pipe as:
select(df1, A, B, E)
You can subset using a vector of column names. I strongly prefer this approach over those that treat column names as if they are object names (e.g. subset()
), especially when programming in functions, packages, or applications.
# data for reproducible example
# (and to avoid confusion from trying to subset `stats::df`)
df <- setNames(data.frame(as.list(1:5)), LETTERS[1:5])
# subset
df[,c("A","B","E")]
There are two obvious choices: Joshua Ulrich's df[,c("A","B","E")]
or
df[,c(1,2,5)]
as in
> df <- data.frame(A=c(1,2),B=c(3,4),C=c(5,6),D=c(7,7),E=c(8,8),F=c(9,9))
> df
A B C D E F
1 1 3 5 7 8 9
2 2 4 6 7 8 9
> df[,c(1,2,5)]
A B E
1 1 3 8
2 2 4 8
> df[,c("A","B","E")]
A B E
1 1 3 8
2 2 4 8
This is the role of the subset()
function:
> dat <- data.frame(A=c(1,2),B=c(3,4),C=c(5,6),D=c(7,7),E=c(8,8),F=c(9,9))
> subset(dat, select=c("A", "B"))
A B
1 1 3
2 2 4
Again using dplyr, where df1 is your original data frame:
df2 <- subset(df1, select = c(1, 2, 5))
[
and subset are not substitutable:
[
does return a vector if only one column is selected.
df = data.frame(a="a",b="b")
identical(
df[,c("a")],
subset(df,select="a")
)
identical(
df[,c("a","b")],
subset(df,select=c("a","b"))
)
You can also use the sqldf
package which performs selects on R data frames as :
df1 <- sqldf("select A, B, E from df")
This gives as the output a data frame df1
with columns: A, B ,E.
For some reason only
df[, (names(df) %in% c("A","B","E"))]
worked for me. All of the above syntaxes yielded "undefined columns selected".
如果我在一个表格里面找里包含特定字符的所有的placement以及它的相关的其他数值,我应该怎么操作呢?有没有相关的code可以参考的呢