The sample CSV is like this:
user_id lat lon
1 19.111841 72.910729
1 19.111342 72.908387
2 19.111542 72.907387
2 19.137815 72.914085
2 19.119677 72.905081
2 19.129677 72.905081
3 19.319677 72.905081
3 19.120217 72.907121
4 19.420217 72.807121
4 19.520217 73.307121
5 19.319677 72.905081
5 19.419677 72.805081
5 19.629677 72.705081
5 19.111860 72.911347
5 19.111860 72.931346
5 19.219677 72.605081
6 19.319677 72.805082
6 19.419677 72.905086
I know I can use haversine for distance calculation (and python also has haversine package):
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees).
Source: http://gis.stackexchange.com/a/56589/15183
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
c = 2 * math.asin(math.sqrt(a))
km = 6371 * c
return km
However, I only want to calculate distances within same id.
So the expected answer would be like this:
user_id lat lon result
1 19.111841 72.910729 NaN
1 19.111342 72.908387 xx*
2 19.111542 72.907387 NaN
2 19.137815 72.914085 xx
2 19.119677 72.905081 xx
2 19.129677 72.905081 xx
3 19.319677 72.905081 NaN
3 19.120217 72.907121 xx
4 19.420217 72.807121 NaN
4 19.520217 73.307121 xx
5 19.319677 72.905081 NaN
5 19.419677 72.805081 xx
5 19.629677 72.705081 xx
5 19.111860 72.911347 xx
5 19.111860 72.931346 xx
5 19.219677 72.605081 xx
6 19.319677 72.805082 NaN
6 19.419677 72.905086 xx
*: xx are numbers of distance in km.
How can I do this?
PS I am using pandas
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…