I was wondering how they could claim unbounded cardinality because what I've seen in TSDBs is a mapping from a unique time series to an id usually 64 bit integers. I've always been interested in making an in-memory TSDB some day mostly because I'd like to use it locally anywhere mobile, backend and frontend.
The only way I've ever thought of unbounded cardinality is to get rid the mapping and store the text close to the data. Practically though, I only ever thought of solving the cardinality problem was keeping the 64 bit integers, but the time series ids would instead be local rather than global. To keep things short, each segment would effectively be a whole new database.
M3DB
If I remember correctly, M3DB uses a Inverted Index where tag -> unique time series id set.
Now if you wanted to fetch time series, you would get each tag-> id set, roaring bitmap, then do an intersection of the bitmaps which is really fast. Then fetch the time series from the ids in the resulting bitmap.
IOx
I explored through IOx code to find where they were storing the time series because that would give me an idea of how they achieved unbounded cardinality. Well it looks like they viewed the problem in terms of a table structure instead as it fits Apache's Arrow.
Just an example: The first column is a set of tags (packed string array), second column is timestamp, the other columns can range from integer, float, bool and string. So when you search for the time series you want, you would search the first column then get row indexes that resulted from the tag search and voila blocks of data that you can do more filtering on.
Assuming this is correct from what I've read so far, I mean it's definitely not a way that I've thought of before and it's fast which I suppose Apache's Datafusion is primarily responsible for.
I wonder if logging systems inspired the tabular data structure or if they just thought it's time to throw out the idea of time series ids.
I'm glossing over a lot of details obviously. Pretty cool idea though. Can't wait to see the benchmarks against other TSDBs.