Currently, we have a set of containers that can be used in the compiler.
As we started to build more powerful runtimes, we might want to
introduce certain container types to our runtime.
This is an RFC to discuss potential new container designs for the runtime.
This RFC focuses on the array-type container as they are simpler, more
lightweight and has a demand in the runtime.
Array: An Example
We need to consider the following factors when designing a
runtime container type.
- F1: Have a POD-C compatible signature for cross language purposes.
- F2: Being able to move c++ container into it.
- F3: Reduce indirect memory allocation when possible.
We discuss the design for array as an example to demonstrate
how to get to these three points in the code below.
// Base array object
class ArrayObj : public Object {
private:
// Use 32bit size?
uint32_t size_;
// the capacity for allocation.
uint32_t capacity_;
// The elements in the array.
ObjectRef* data_;
// helper class to move from std::vector
class FromStd;
};
// Move from std
class ArrayObj::FromStd : public ArrayObj {
private:
std::vector<ObjectRef> data_;
explicit FromStd(std::vector<ObjectRef> data)
: data_(data) {
data_ = data_.data();
CHECK_LE(data_.capacity(), UINT_MAX);
size_ = static_cast<uint32>(data_.size());
capacity_ = static_cast<uint32>(data_.capacity());
}
friend ObjectPtr<ArrayObj> ArrayFromVector;
};
// make string by move a string without copying the content.
ObjectPtr<ArrayObj> ArrayFromVector(std::vector<ObjectRef> data) {
return make_object<ArrayObj::FromStd>(data);
}
// Special function to make array of size in a single block
ObjectPtr<ArrayObj> ArrayAlloc(size_t n, size_t cap) {
// simplified code, need additional code to handle alignment
void* ptr = malloc(sizeof(ArrayObj) + cap * sizeof(ObjectRef));
ArrayObj* header = static_cast<ArrayObj*>(ptr);
// initialize the header.
header->data_ = reinterpret_cast<ObjectRef*>(
static_cast<char*>(ptr) + sizeof(ArrayObj));
header->size_ = size;
header->capacity_ = cap;
// initialize other fields, such as destructor
//
return make_object<ArrayObj::FromStd>(data);
}
Discussions about the code.
- vector is not exposed in ArrayObj(F1), instead we directly use POD data.
- data field is necessary to implement F2 (see FromStd).
This field allows us to move in from std::vector container without copying the content. - When creating an array by ourselves, we can directly allocate the
data chunk right after the ArrayObj header (ArrayAlloc).
However, we will need a special allocator support to do so. - For our use cases in IR, it is unlikely that an array will
go beyond 32bit int, so we could use 32 bit int to store the data,
altough it might be overkill to save 8 bytes. - The total overhead of array(without additional elements) is 4 * 8 bytes (if we use uint32 for size).
We can make several possible design variations.
- If we do not want the ability to move from an existing container, we can remove the data field.
- If we do not need frequent resizing of the array(e.g. push_back),
then the capacity field is not necessary (which is useful for Tuple and ADT). - Wehether it is worth to save 8 bytes by making the size and capacity int32,
or should we go with int64_t.
Proposal for other Containers
In this section, we propose two other containers: ADT and String.
// Base ADT object
class ADTObj : public Object {
private:
// tag for ADT
uint32_t tag_;
// the capacity for allocation.
uint32_t size_;
// The elements in the tuple
// ObjectRef* data_;
};
In the case of ADT, we do not need to resize.
As a result, we can simply remove the capacity field and replace it by a tag field.
It may or makes sense to introduce the data field.
Given that we mostly want to allocate the content, we might as well remove the data field.
Note that means we lose the ability to move from an existing container.
// Base String object
class StringObj : public Object {
private:
// the capacity for allocation.
uint64_t size_;
// The elements in the array.
char* data_;
};
String is another interesting case.
In this case the ability to move from std::string is useful(which means we need data_).
Size may need to be 64bit in case we use it to pass a big binary blob.
It is less frequent to resize an string versus an array, so we may not need the capacity field.
String can be used to serve as a container for passing return std::string in TVMRetValue.
Why the three containers
- ADT is already used in vm so that it is more of a proposal to upgrade it to be more cross platform.
- Array is currently being used in compiler but not in runtime, via support using std::vector, it might be interesting to see if we would like to bring it to runtime. Regardless of whether it should be in runtime, we might want to update to keep the signature in POD-C compatible style so that it can be cheaply accessed in another language
- String is an interesting one. If we want to make the IR accessible in a full POD-C compatible fashion, then we need a string container that is not std::string, it might also benefit how we return strings from a PackedFunc.
Why POD-C compatible types
The main advantage is the cross language. Imagine we are building a pure C code generator(or llvm) that interacts with these container objects. If the array is hidden behind a std::vector, then we will have to call into a runtime function ADTGetElem in order to get the corresponding element. We can directly generate an accessor code to get the corresponding element by addressing.
Summary
In this RFC, we discussed design variants for runtime array containers and their trade-offs.
It would be great if we can have a discussion about:
- Whether we want to introduce these container objects
(in particular String and Array) into runtime. - What choice do you like for each container type.