DTSource Basics

DTSource stands for DataTank Source library and has been around for a while. There are several parts to this class library

  • Array classes: C++ does not have a proper multi-dimensional array class, and since those are absolutely crucial in numerical everyone ends up having to depend such a library. Rather than using a template class for this, DTSource has arrays of double, float, int, short, char. This is done to make it easier to debug and read. It also contains templates for creating arrays and lists of any other class. These arrays are 1,2 or 3 dimensional and use the function notation to access elements, i.e. A(2,3), A(0,1,3), A(5) etc. It does not have four dimensional arrays, because in numerics if you find yourself wanting a 4 dimensional array you are doing something wrong and should re-think your data structre.
  • Data Files: There are two binary file formats supported. dtbin and mat. The .mat file format is Matlab file format 4, which is sufficient to save arrays, text and numbers. The .dtbin is a file format specific to tools made by Visual Data Tools, and is a container format just like .mat but has some improvements over .mat such as indexes and better support for long variable names/unicode. Files can get very large with hundreds of thousands of arrays, but matlab will run into problems reading some of those .mat files since it wants to load in the entire file and DTSource gives you random access to entries in the binary file.
  • Data Types: These are object types defined in DataTank and ImageTank, and are meant to allow you to move data back and forth. Each data type includes code to read and write this object type to a data file. And the data file can be a .mat or .dtbin file. There is a naming convention so that ImageTank can recognize what variables are defined in the file, their names and types.

Arrays

DTDoubleArray, DTDoubleComplexArray, DTFloatArray, DTIntArray, DTUShortArray, DTShortArray, DTUCharArray, DTCharArray. This depends on the type of the values. The structure and functions are pretty much the same. Here we go through the structure for the double array (64 bit floating point numbers). The others are similar.

There are two key concepts that you need to get from the beginning. The first is the difference between DTDoubleArray and DTMutableDoubleArray, and the other is how the data is stored and how that affects assignments and changes.

DTDoubleArray vs DTMutableDoubleArray

Technically DTDoubleArray is the base class for DTMutableDoubleArray. That means that any function that expects a DTDoubleArray can accept a DTMutableDoubleArray, but not vice versa. The DTDoubleArray can not be modified, so in order to create it, you first need to create a DTMutableDoubleArray and then assign it to a DTDoubleArray or return it as a DTDoubleArray. For example

DTMutableDoubleArray values(5);
values = 0;
DTDoubleArray a = values;

In the first line you create a list of values with length 5. This just allocates the memory, but doesn’t set it to anything. The second line sets all values to 0, and does not make it into a number like matlab would do. The third line creates a new array “a” and sets it to values. Note that you can not then say “a = 1”, since a is a read only view of the values array. In the following case some of the code will not compile

double foo1(DTMutableDoubleArray &v)
{
    return v(0);
}

double foo2(DTDoubleArray &v)
{
    return v(0);
}

DTMutableDoubleArray first(3);
first = 1;
DTDoubleArray second = first;

double v1 = foo1(first);
double v2 = foo2(first);
double v3 = foo1(second);
double v4 = foo2(second);

first = second;

v1 : works because first is a mutable array and foo1 expects a mutable
v2 : works because foo2 requires DTDoubleArray or any class that is derived from that, and DTDoubleArray is a derived class.
v3 : Does not compile. This is because foo1 requires DTMutableDoubleArray or a derived class, but DTDoubleArray is not derived from the mutable version.
v4 : An exact match, just like for v1.

first = second doesn’t work for a similar reason. And from a name convention you can’t assign a read only object (second) to a read-write objecct (first) since that would give you a way to change the values for second.

In DTSource the following convention is observed. If you hand in a DTDoubleArray it means that the function will not change any values. And if you hand in a DTMutableDoubleArray it indicates that the function will likely change entries, since if it didn’t it would just accept the argument as DTDoubleArray.

Storing data

The arrays are stored using a reference counted object. This is sometimes called a smart pointer. This is done to avoid copying data when you are handing arrays around. That does however mean that you will have multiple arrays that point to the same data. Explain this with a simple example

DTMutableDoubleArray A(6);
A = 3;
A(5) = 5;
DTDoubleArray B;
B = A;
A(3) = 2;
double v1 = A(3);
double v2 = B(3);

At the end v1 and v2 are the same value. This is because the following happened. The first line allocated enough memory for 6 entries. The second line set all values to 3 and the third set the last entry to 5.
The fourth line created a read only array B, but it is empty.
The fifth line sets B to A, but what happens is that now point to the same list of 6 entries.
The sixth line sets the fourth entry in that list to 2. That means that the last two entries will be the same because they are pointing to the same list. If the intent was to let B be the current value and separate from A you need to make that explicit by using the Copy() member function, that is

B = A.Copy();

This design is used throughout DTSource. Copies by default are shallow copies and not deep ones. This avoids the issue where an innocent math statement suddenly has the side effect of copying potentially a lot of data and eating up memory. Since unless explicitly mentioned by a Mutable keyword classes in DTSource are read only, that means that memory can be shared between objects since none of them will change it. You can also ask an array how many objects share the same data by using the member functions ReferenceCount() and MutableReferences(). If the first one is 1 it means that this object is the only object referencing the list, if the second one is 0 it means that there is no object defined that can change the values of the object.

Different sizes and dimensions

You allocate memory by using the constructor. You can define 1,2 or 3 dimensions.

DTMutableDoubleArray A;
DTMutableDoubleArray B(10);
DTMutableDoubleArray C(10,5);
DTMutableDoubleArray D(5,10,10);

The first creates an empty array, and the only reason you want to do that is if you are going to assign it later to another mutable array. You might want to do that if you don’t know the size that you need, but you need to define it there because of scoping issues.

The second creates a list, accessed with B(0),…B(9). Any access outside this bound will cause a runtime error message. You don’t get a run-time exception but rather an error message. Set a breakpoing in DTError.cpp if you want to catch that during a debug.

The third creates a two dimensional array/matrix. You access that like C(i,j). You can also access it as a single list such as C(12). This is because behind the scenes C(10,5) allocates a single list with 50 entries and then maps (i,j) to the index i+10*j. This is called column major or sometimes referred to as Fortran order because Fortran used this layout for multidimensional arrays. C doesn’t have multidimensional arrays, and the closest to them is a list of list, accessed as C[i][j]. If you did something like that to create a 10×5 array, C[i] points to the i’th list and then C[i][j] would be the j’th entry in the i’th list. That means that C[i][j] and C[i][j+1] are adjacent in memory, while column major puts C(i,j) and C(i+1,j) next to each other in memory. Using this order makes it easier to exchange data with matlab, which also uses column major layout and many classical numerical libraries. The difference is really a transpose of a matrix.

You can not change the size of an array. This is because the array uses contiguous memory, and the memory manager will typically have a different object right after the memory that was allocated for an array. What you need to do is to allocate the new size and copy the data over. This is done with the function IncreaseSize(). This will however only add to the last dimension. That is

DTMutableDoubleArray one(5);
DTMutableDoubleArray two(4,9);
DTMutableDoubleArray three(2,5,7);

DTMutableArray test;
test = IncreaseSize(one,10); // Works, 15 entries
test = IncreaseSize(two,8); // Works becomes 4x11
test = IncreaseSize(two,9); // Does not work 9!=multiple of 4
test = IncreaseSize(three,10); // Works becomes 2x5x8
test = IncreaseSize(three,11); // Fails, 11 is not a multiple of 2*5

For all of these the old array is copied into the beginning of the new array. The new entries are not initialized, so they could be random gunk. Problem is that often when you are getting new memory this is 0, and then when memory is re-used this is whatever was in that memory earlier.

One common case is when you don’t know the size of an array beforehand and want it to grow as you get new points. At the end you then need to trim the array to the final size. Use an example a case where you are adding two values (x,y) to a list. The array should be 2xN, since you want the x,y values to be adjacent in memory for speed purposes.

DTMutableDoubleArray list(2,10); // 10 = initial guess
int pos = 0;
while (condition) {
    x = something; y = something;
    if (pos==list.n()) { // No space remaining in the list
        // double the size, this is called logarithmic growth
        list = IncreaseSize(list,list.Length());
    }
    list(0,pos) = x;
    list(1,pos) = y;
    pos++;
}
list = TruncateSize(list,2*pos); // Trim the final length

The TruncateSize needs to truncate it to a size that will allow it to keep the same first dimension. Only the last dimension can change.

Debugging

During development you often need to see what the content of the current array is. The array class has a number of member functions defined to make this easier. Look into the header file to see what functions are defined. But you always have the pinfo() and pall() defined for arrays. pinfo() gives you a short description, basically just the size information. pall() prints every entry to the screen, so be careful. You can also print individual entries using the () method.

Data Files

Two main classes here, DTDataFile and DTMatlabDataFile. They are both based on the DTDataStorage class, and read and write routines take the base class in as input so you can use either class.

Basic operation is that you create a data file that opens an existing file or removes any existing file and creats a new one.

Entries are stored by name. There are two types, string or array. Numbers are stored as an array with dimensions 1x1x1. You can write them using the Save() method and read them using various Read****Array() methods. You can query the file to see if a variable exists or not. If you save a variable with a name that already exists, it will not remove this variable but the old value will not be accessible. This means you have effectively overwritten it, but the file didn’t change size. This even happens if the data would have the same size. This is done to make the file format simpler, and also so that any program that is reading the output you are writing knows that you can’t go backwards in the file.

Many of the classes described in the next section overload the functions Read, Write and WriteOne to save content to the data file. That means that you can use a call like

WriteOne(dataFile,"name",object);

in a code to save an object into a file with a given name. The function that gets called will depend on the type of the object and will typically save several arrays/strings into the data file. The object is saved in such a way that ImageTank will recognize the type.

Data Types

ImageTank defines a number of data types, and almost all of them have a matching class type in DTSource. This allows external programs to send data over to a C++ program that uses DTSource and makes it easy for C++ to hand back one or more objects back. There are a number of different class types, and they will be explained further elsewhere, but this will focus on DTTable.

DTTable

DTTable is a C++ class to match the table class in ImageTank. It is a tabular data similar to classes called data frames in some other languages/libraries.

At the top level, a DTTable has one or more columns and 0 and more rows. The column has a unique name and a type. The types are numbers, dates, text, points, polygons, surfaces etc. It uses a number of C++ tricks to make the syntax look relatively simple. The goal is to make it easy to extract columns and data from columns and easy to create a table from columns. You can even share columns between tables.

Creating a table

You create a table from a list of columns. Each column is a DTTableColumn object and are put into a DTList<DTTableColumn>. For example, to add a couple of numerical columns

DTMutableList<DTTableColumn> columns(2);
columns(0) = CreateTableColumn("x",firstList);
columns(1) = CreateTableColumn("y",secondList);
DTTable table(columns);

The function CreateTableColumn is overloaded, but all of them have the type CreateTableColumn(name,content), where content can be more than one argument, but the name is a string object. The type that is created depends on the content type. For example if you have a 2xN array of points and want to make a point column

DTMutableList<DTTableColummn> columns(1);
DTPointCollection2D points(xyList);
columns(0) = CreateTableColumn("point",points);
DTTable table(columns);

The second line creates a point collection object from the point list. You can combine the second and third line

DTMutableList<DTTableColummn> columns(1);
columns(0) = CreateTableColumn("p",DTPointCollection2D(xyList));
DTTable table(columns);

Extract data from table

If you get a table that has two columns x and y and need to extract the array from that table. After that you get the data from the table.

DTTableColumnNumber xInCol = table("x");
DTTableColumnNumber yInCol = table("y");
DTDoubleArray xList = xInCol.DoubleVersion();
DTDoubleArray yList = yInCol.DoubleVersion();

If the type of “x” is not a numerical column you get a run-time error.