Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
300 views
in Technique[技术] by (71.8m points)

r - What is the internal implementation of lists?

I am curious how an object of type list is implemented. Is it

  1. a dynamic vector that will automatically increase its size when it is full.
  2. a linked list where appending an item is O(1), but accessing an item is O(n).
  3. a tree structure with O(log(n)) item access.
  4. a hashtable with O(1) item access.

I am curious because lists can have key-value pairs that make them look like hash tables, but the elements are in order, which looks like a vector.

Edit: because length(list(runif(1e4))) is 1, so when append element to a list, it looks like that it copy the whole list every time, that makes it very slow:

But the access speed is much slower than a vector:

z1 <- runif(1e4)
system.time({
  for(i in 1:10000) z1[[1 + i]] <- 1
})

outputs:

user  system elapsed 
0.060   0.000   0.062 

but:

z1 <- list(runif(1e4))
system.time({
  for(i in 1:10000) z1[[1 + i]] <- 1
})

outputs:

user  system elapsed 
1.31    0.00    1.31 

init a list with 10000 elements:

z1 <- as.list(runif(1e4))
system.time({
  for(i in 1:10000) z1[[1 + i]] <- 1
})

outputs:

user  system elapsed 
0.060   0.000   0.065 

For the key & value access:

z1 <- list()
for(i in 1:10000){key <- as.character(i); z1[[key]] <- i} 
system.time({
  for(i in 1:10000) x <- z1[["1"]]
})
system.time({
  for(i in 1:10000) x <- z1[["10000"]]
})

The output is:

user  system elapsed 
0.01    0.00    0.01 
user  system elapsed 
1.78    0.00    1.78 

It's not an O(1) access, so it's not a hash table. My conclusion is that it's not a dynamic array since appending items will cause memory accesses every time; it's not a hashtable since access by key is not O(1).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Lists are essentially just arrays of R objects (SEXP). Resizing causes copies of the whole data and name lookup is linear.

Alternatively, you can use environments, which use hash tables internally.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...