How do you convert a data frame column to a numeric type?
转载于:https://stackoverflow.com/questions/2288485/how-to-convert-a-data-frame-column-to-numeric-type
Since (still) nobody got check-mark, I assume that you have some practical issue in mind, mostly because you haven't specified what type of vector you want to convert to numeric
. I suggest that you should apply transform
function in order to complete your task.
Now I'm about to demonstrate certain "conversion anomaly":
# create dummy data.frame
d <- data.frame(char = letters[1:5],
fake_char = as.character(1:5),
fac = factor(1:5),
char_fac = factor(letters[1:5]),
num = 1:5, stringsAsFactors = FALSE)
Let us have a glance at data.frame
> d
char fake_char fac char_fac num
1 a 1 1 a 1
2 b 2 2 b 2
3 c 3 3 c 3
4 d 4 4 d 4
5 e 5 5 e 5
and let us run:
> sapply(d, mode)
char fake_char fac char_fac num
"character" "character" "numeric" "numeric" "numeric"
> sapply(d, class)
char fake_char fac char_fac num
"character" "character" "factor" "factor" "integer"
Now you probably ask yourself "Where's an anomaly?" Well, I've bumped into quite peculiar things in R, and this is not the most confounding thing, but it can confuse you, especially if you read this before rolling into bed.
Here goes: first two columns are character
. I've deliberately called 2nd one fake_char
. Spot the similarity of this character
variable with one that Dirk created in his reply. It's actually a numerical
vector converted to character
. 3rd and 4th column are factor
, and the last one is "purely" numeric
.
If you utilize transform
function, you can convert the fake_char
into numeric
, but not the char
variable itself.
> transform(d, char = as.numeric(char))
char fake_char fac char_fac num
1 NA 1 1 a 1
2 NA 2 2 b 2
3 NA 3 3 c 3
4 NA 4 4 d 4
5 NA 5 5 e 5
Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
but if you do same thing on fake_char
and char_fac
, you'll be lucky, and get away with no NA's:
> transform(d, fake_char = as.numeric(fake_char),
char_fac = as.numeric(char_fac))
char fake_char fac char_fac num
1 a 1 1 1 1
2 b 2 2 2 2
3 c 3 3 3 3
4 d 4 4 4 4
5 e 5 5 5 5
If you save transformed data.frame
and check for mode
and class
, you'll get:
> D <- transform(d, fake_char = as.numeric(fake_char),
char_fac = as.numeric(char_fac))
> sapply(D, mode)
char fake_char fac char_fac num
"character" "numeric" "numeric" "numeric" "numeric"
> sapply(D, class)
char fake_char fac char_fac num
"character" "numeric" "factor" "numeric" "integer"
So, the conclusion is: Yes, you can convert character
vector into a numeric
one, but only if it's elements are "convertible" to numeric
. If there's just one character
element in vector, you'll get error when trying to convert that vector to numerical
one.
And just to prove my point:
> err <- c(1, "b", 3, 4, "e")
> mode(err)
[1] "character"
> class(err)
[1] "character"
> char <- as.numeric(err)
Warning message:
NAs introduced by coercion
> char
[1] 1 NA 3 4 NA
And now, just for fun (or practice), try to guess the output of these commands:
> fac <- as.factor(err)
> fac
???
> num <- as.numeric(fac)
> num
???
Kind regards to Patrick Burns! =)
if x
is the column name of dataframe dat
, and x
is of type factor, use:
as.numeric(as.character(dat$x))
Tim is correct, and Shane has an omission. Here are additional examples:
R> df <- data.frame(a = as.character(10:15))
R> df <- data.frame(df, num = as.numeric(df$a),
numchr = as.numeric(as.character(df$a)))
R> df
a num numchr
1 10 1 10
2 11 2 11
3 12 3 12
4 13 4 13
5 14 5 14
6 15 6 15
R> summary(df)
a num numchr
10:1 Min. :1.00 Min. :10.0
11:1 1st Qu.:2.25 1st Qu.:11.2
12:1 Median :3.50 Median :12.5
13:1 Mean :3.50 Mean :12.5
14:1 3rd Qu.:4.75 3rd Qu.:13.8
15:1 Max. :6.00 Max. :15.0
R>
Our data.frame
now has a summary of the factor column (counts) and numeric summaries of the as.numeric()
--- which is wrong as it got the numeric factor levels --- and the (correct) summary of the as.numeric(as.character())
.
Something that has helped me: if you have ranges of variables to convert (or just more then one), you can use sapply
.
A bit nonsensical but just for example:
data(cars)
cars[, 1:2] <- sapply(cars[, 1:2], as.factor)
Say columns 3, 6-15 and 37 of you dataframe need to be converted to numeric one could:
dat[, c(3,6:15,37)] <- sapply(dat[, c(3,6:15,37)], as.numeric)
With the following code you can convert all data frame columns to numeric (X is the data frame that we want to convert it's columns):
as.data.frame(lapply(X, as.numeric))
and for converting whole matrix into numeric you have two ways: Either:
mode(X) <- "numeric"
or:
X <- apply(X, 2, as.numeric)
Alternatively you can use data.matrix
function to convert everything into numeric, although be aware that the factors might not get converted correctly, so it is safer to convert everything to character
first:
X <- sapply(X, as.character)
X <- data.matrix(X)
I usually use this last one if I want to convert to matrix and numeric simultaneously
Though others have covered the topic pretty well, I'd like to add this additional quick thought/hint. You could use regexp to check in advance whether characters potentially consist only of numerics.
for(i in seq_along(names(df)){
potential_numcol[i] <- all(!grepl("[a-zA-Z]",d[,i]))
}
# and now just convert only the numeric ones
d <- sapply(d[,potential_numcol],as.numeric)
For more sophisticated regular expressions and a neat why to learn/experience their power see this really nice website: http://regexr.com/
I would have added a comment (cant low rating)
Just to add on user276042 and pangratz
dat$x = as.numeric(as.character(dat$x))
This will override the values of existing column x
To convert a data frame column to numeric you just have to do:-
factor to numeric:-
data_frame$column <- as.numeric(as.character(data_frame$column))
If you run into problems with:
as.numeric(as.character(dat$x))
Take a look to your decimal marks. If they are "," instead of "." (e.g. "5,3") the above won't work.
A potential solution is:
as.numeric(gsub(",", ".", dat$x))
I believe this is quite common in some non English speaking countries.
Universal way using type.convert()
and rapply()
:
convert_types <- function(x) {
stopifnot(is.list(x))
x[] <- rapply(x, utils::type.convert, classes = "character",
how = "replace", as.is = TRUE)
return(x)
}
d <- data.frame(char = letters[1:5],
fake_char = as.character(1:5),
fac = factor(1:5),
char_fac = factor(letters[1:5]),
num = 1:5, stringsAsFactors = FALSE)
sapply(d, class)
#> char fake_char fac char_fac num
#> "character" "character" "factor" "factor" "integer"
sapply(convert_types(d), class)
#> char fake_char fac char_fac num
#> "character" "integer" "factor" "factor" "integer"
In my PC (R v.3.2.3), apply
or sapply
give error. lapply
works well.
dt[,2:4] <- lapply(dt[,2:4], function (x) as.factor(as.numeric(x)))
Considering there might exist char columns, this is based on @Abdou in Get column types of excel sheet automatically answer:
makenumcols<-function(df){
df<-as.data.frame(df)
cond <- apply(df, 2, function(x) {
x <- x[!is.na(x)]
all(suppressWarnings(!is.na(as.numeric(x))))
})
numeric_cols <- names(df)[cond]
df[,numeric_cols] <- apply(df[,numeric_cols],2, as.character) # deals with factors
df[,numeric_cols] <- sapply(df[,numeric_cols], as.numeric)
return(df)
}
df<-makenumcols(df)
To convert character to numeric you have to convert it into factor by applying
BankFinal1 <- transform(BankLoan, LoanApproval=as.factor(LoanApproval))
BankFinal1 <- transform(BankFinal1, LoanApp=as.factor(LoanApproval))
You have to make two columns with the same data, because one column cannot convert into numeric. If you do one conversion it gives the below error
transform(BankData, LoanApp=as.numeric(LoanApproval))
Warning message: In eval(substitute(list(...)), `_data`, parent.frame()) : NAs introduced by coercion
so, after doing two column of the same data apply
BankFinal1 < transform(BankFinal1, LoanApp = as.numeric(LoanApp),
LoanApproval = as.numeric(LoanApproval))
it will transform the character to numeric successfully
If the dataframe has multiple types of columns, some characters, some numeric try the following to convert just the columns that contain numeric values to numeric:
for (i in 1:length(data[1,])){
if(length(as.numeric(data[,i][!is.na(data[,i])])[!is.na(as.numeric(data[,i][!is.na(data[,i])]))])==0){}
else {
data[,i]<-as.numeric(data[,i])
}
}
with hablar::convert
To easily convert multiple columns to different data types you can use hablar::convert
. Simple syntax: df %>% convert(num(a))
converts the column a from df to numeric.
Detailed example
Lets convert all columns of mtcars
to character.
df <- mtcars %>% mutate_all(as.character) %>% as_tibble()
> df
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
2 21 6 160 110 3.9 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
With hablar::convert
:
library(hablar)
# Convert columns to integer, numeric and factor
df %>%
convert(int(cyl, vs),
num(disp:wt),
fct(gear))
results in:
# A tibble: 32 x 11
mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <int> <dbl> <dbl> <dbl> <dbl> <chr> <int> <chr> <fct> <chr>
1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
2 21 6 160 110 3.9 2.88 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.22 19.44 1 0 3 1