Dense Arrays¶
In this tutorial we will learn how to create, read, and write a simple dense array in TileDB.
Program |
Links |
|
Basic concepts and definitions¶
Creating a dense array¶
Note
The order of the dimensions (as added to the domain) is important later when
specifying subarrays. For instance, in the above schema, subarray
[1,2], [2,4]
means slice the first two values in the rows
dimension
domain, and values 2,3,4
in the cols
dimension domain.
All that is left to do is create the empty array on disk so that it can be written to. We specify the name of the array to create, and the schema to use. This command will essentially persist the array schema we just created on disk.
Note
The array name here will be used to create a data directory in the
current working path (see On-disk Structure. below).
The array name can also be a full URI, for example a path like
file:///home/username/my_array
or an S3 URI like
s3://bucket-name/array-name
.
Writing to the array¶
We will populate the array with values 1, 2, ..., 16
.
To start, prepare the data to be written:
Although the cell layout is covered thoroughly in later tutorials, here what
you should know is that you are telling TileDB that the cell values in your
buffer will be written in row-major order in the cells of the array (i.e.,
1
will be stored in cell (1,1)
, 2
in (1,2)
, etc.).
The array data is now stored on disk. The resulting array is depicted in the figure below.
Reading from the array¶
We will next explain how to read the cell values in subarray
[1,2], [2,4]
, i.e., in the blue rectangle shown in the figure above.
The result values should be 2 3 4 6 7 8
, reading in
row-major order (i.e., first the three selected columns of row 1
,
then the three selected columns of row 2
).
The row-major layout here means that the cells will be returned in row-major order
within the subarray [1,2], [2,4]
(more information on cell layouts
is covered in later tutorials).
Now data
holds the result cell values on attribute a
.
If you compile and run the example of this tutorial as shown below, you should
see the following output:
On-disk structure¶
A TileDB array is stored on disk as a directory with the name given at the time of array creation. If we look into the array on disk after it has been written to, we will see something like the following
$ ls -l quickstart_dense_array/
total 8
drwx------ 4 stavros staff 128 Jun 25 15:18 __1561490302161_1561490302161_15bab0281e2e44f2a803eb6f3001ed00
-rwx------ 1 stavros staff 149 Jun 25 15:18 __array_schema.tdb
-rwx------ 1 stavros staff 0 Jun 25 15:18 __lock.tdb
drwx------ 2 stavros staff 64 Jun 25 15:18 __meta
The array directory and files __array_schema.tdb
and __lock.tdb
were written upon
array creation, whereas subdirectory
__1561490302161_1561490302161_15bab0281e2e44f2a803eb6f3001ed00
was
created after array writting. This subdirectory, called fragment, contains the written
cell values for attribute a
in file a.tdb
, along with associated metadata:
$ ls -l quickstart_dense_array/__1561490302161_1561490302161_15bab0281e2e44f2a803eb6f3001ed00/
total 16
-rwx------ 1 stavros staff 602 Jun 25 15:18 __fragment_metadata.tdb
-rwx------ 1 stavros staff 84 Jun 25 15:18 a.tdb
The TileDB array hierarchy on disk and more details about fragments are discussed in later tutorials.