Sparse Arrays¶
In this tutorial we will learn how to create, read, and write a simple sparse array in TileDB.
Program |
Links |
|
Basic concepts and definitions¶
Creating a sparse array¶
Note
The order of the dimensions (as added to the domain) is important later when
specifying subarrays. For instance, in the above example, subarray
[1,2], [2,4]
means slice the first two values in the rows
dimension
domain, and values 2,3,4
in the cols
dimension domain.
All that is left to do is create the empty array on disk so that it can be written to. We specify the name of the array to create, and the schema to use. This command will essentially persist the array schema we just created on disk.
Writing to the array¶
We will populate the array by writing some values to its cells, specifically
1
, 2
, and 3
at cells (1,1)
, (2,4)
and (2,3)
,
respectively. Notice that, contrary to the dense case, here we specify
the exact indices where the values will be written, i.e., we provide
the cell coordinates.
The array data is now stored on disk. The resulting array is depicted in the figure below.
Reading from the array¶
We will next explain how to read the cell values in subarray
[1,2], [2,4]
, i.e., in the blue rectangle shown in the figure above.
The result values should be 3 2
, reading in row-major order.
The row-major layout here means that the cells will be returned in row-major order
within the subarray [1,2], [2,4]
(more information on cell layouts
is covered in later tutorials).
If you compile and run this tutorial example as shown below, you should see the following output:
On-disk structure¶
A TileDB array is stored on disk as a directory with the name given at the time of array creation. If we look into the array on disk after it has been written to, we will see something like the following
$ ls -l quickstart_sparse_array/
total 8
drwx------ 5 stavros staff 160 Jun 25 15:22 __1561490578769_1561490578769_9e429a59930b4a9c83baa57eb2fb41a8
-rwx------ 1 stavros staff 153 Jun 25 15:22 __array_schema.tdb
-rwx------ 1 stavros staff 0 Jun 25 15:22 __lock.tdb
drwx------ 2 stavros staff 64 Jun 25 15:22 __meta
The array directory and files __array_schema.tdb
and __lock.tdb
were written upon
array creation, whereas subdirectory
__1561490578769_1561490578769_9e429a59930b4a9c83baa57eb2fb41a8
was
created after array writting. This subdirectory, called fragment, contains the written
cell values for attribute a
in file a.tdb
and the corresponding coordinates in
a separate file __coords.tdb
, along with associated metadata:
$ ls -l quickstart_sparse_array/__1561490578769_1561490578769_9e429a59930b4a9c83baa57eb2fb41a8/
total 24
-rwx------ 1 stavros staff 106 Jun 25 15:22 __coords.tdb
-rwx------ 1 stavros staff 611 Jun 25 15:22 __fragment_metadata.tdb
-rwx------ 1 stavros staff 32 Jun 25 15:22 a.tdb
The TileDB array hierarchy on disk and more details about fragments are discussed in later tutorials.