I have to data frames with X, Y, and Z coordinates. I want to find the distance between all of the points in the two data frames. (Like the distance between entry A1 and every entry in B, A2 and every entry in B, and so on, and vice versa). I basically did this:
1.) Wrote a function that calculates the distance between two points. 2.) Used the distanceFinder function to create a function that finds the distance between one point in a group, and every other point in the opposite group. 3.) Created a function called bigDistance() that calls filter() on every entry in one group, and appends the results to an empty data frame through a for loop until it's completed.
This code takes about 2 minutes to run on the file I'm experimenting with, and I just found out that I have to translate this algorithm to PHP... so I guess this is kind of an optimization question, because I feel like PHP would be way slower at making these computations than R? Sorry if people find this "off-topic" but yeah, super new to programming and Big O notation and stuff, so any tips would be amazing! Thanks!
The dist
function does exactly what you are looking for.
myDf <- data.frame(
x = rnorm(8),
y = rnorm(8),
z = rnorm(8)
)
dist(myDf)
# 1 2 3 4 5 6 7
# 2 3.0457054
# 3 1.7260658 3.2107845
# 4 1.2839101 3.4596211 2.9451175
# 5 1.5656231 4.0154389 2.3421445 2.3612348
# 6 1.9294650 1.6655718 1.7977887 2.8726174 2.5815296
# 7 2.1842743 3.5274692 3.8552701 1.0984651 2.9951244 3.3220919
# 8 1.4795857 3.5364663 0.5567753 2.7033371 1.9226225 2.0631788 3.6624082
It seems to be pretty fast as well (73ms on average)
library(microbenchmark)
mb <- microbenchmark(dist(myDf))
mb
# Unit: microseconds
# expr min lq mean median uq max neval
# dist(myDf) 70.436 71.453 77.4083 72.978 82.133 172.911 100
autoplot(mb)