It can be fairly straightforward to create a CloudTurbine source (i.e., a CloudTurbine-compatible folder/file structure) either manually, using the CloudTurbine Java API or with practically any programming language.  However, reading CloudTurbine source data is another matter.  The CloudTurbine Java API sink class, CTreader, has been developed to handle all possible “flavors” of the CloudTurbine folder/file structure (ZIP folders, different folder hierarchies, various data file formats) and to efficiently sort through and fulfill a data request.  Therefore, we encourage most users to develop their sink applications using the Java API CTreader class.  If you would like to understand more about the CloudTurbine folder/file structure (either to satisfy curiosity or as the basis for developing an API in an alternate language) check out the Structure document.

Let’s make a simple CloudTurbine sink application to read data previously output by CTsource.  You may want to reference the CloudTurbine Javadoc documentation as you work through this tutorial.  Let’s assume the source is named CTsource and it is located under a CTdata directory; that is, at ./CTdata/CTsource.  CTsource contains 100 seconds worth of data at 1 second intervals on each of three channels: “c0”, “c1.f32” and “c2.txt”.  Our simple sink class will read all data from the “c1.f32” channel (single precision floats) and “c2.txt” channel (text).

Start off by importing the CloudTurbine classes (found in “CTlib.jar”).

Define variables containing the root folder name (CTdata) and the name of the CloudTurbine source (CTsource, located under the CTdata root folder).  Create a CTreader object which points to the root folder.

All of the available channel names in a source can be obtained using the listChans method.

The oldest and newest times for any data in the source are obtained as follows.  Times are represented as seconds since epoch (midnight on January 1, 1970).

Do the following to get all data on channel “c1.f32”.  Similar code is used to get the string data for channel “c2.txt”.

The call to getData used above deserves some attention.  The method is shown below.

The first two arguments are the names of the source and channel we want to fetch data from.  The third argument, tget, is the data start time.  The fourth argument, tdur, is the duration of data to fetch (typically seconds or milliseconds).  The last argument, tmode, is the time reference, which defines how to use tget and tdur to fetch data.  Tmode can be one of “oldest”, “newest”, “after”, or “absolute”.  Let’s consider “oldest” (which our code uses) and “absolute”.  First, consider the simplest case, “absolute”: tget must be the actual, absolute start time (typically seconds or milliseconds since epoch); getData will return data from tget to (tget+tdur).  When the mode is “oldest”, tget is interpreted as a relative offset from the oldest data in the source.  Let’s say a source’s data goes from epoch time 1488830000 to 1488831000.  If we use mode=”oldest”, tget=100 and duration=200, the call to getData will return data in the range (oldest time + 100) = 1488830100 to (oldest time + 100 + 200) = 1488830300.  In our code snippet above, since mode is “oldest”, tget is 0 and tdur is (newestTime-oldestTime), the call to getData will return data in the range (oldestTime+0) to (oldestTime+0+(newestTime-oldestTime)), in other words from oldestTime to newestTime, i.e. all the data!

The full listing for SimpleSink.java is provided below.

To compile SimpleSink, you need to include the CloudTurbine library, CTlib.jar, on the javac classpath.  To run SimpleSink, the java classpath needs to include CTlib.jar as well as the folder where SimpleSink.class is located.

Run CTsource first to produce the sample data set.  Let’s say the CTsource data is located in CloudTurbine/CTdata/CTsource; you would want to run SimpleSink from the CloudTurbine directory (i.e., so it can dive into CTdata/CTsource to fetch the sample data).  A truncated version of the SimpleSink output is given below.