In part I of these series, we discussed applications, which provide the model and data provider, and sessions, which encapsulate high-level data context. In part II, we covered command types and inputs to the data pipeline.
In this article, we're going to take a look at the data pipeline itself.
The primary goal of the data pipeline is, of course, to correctly execute each query to retrieve data or command to store, delete or refresh data. The diagram to the right shows that the pipeline consists of several data handlers. Some of these refer to data sources, which can be anything: an SQL database or a remote service.[^1]
The name "pipeline" is only somewhat appropriate: A command can jump out anywhere in the pipeline rather than just at the opposite end. A given command will be processed through the various data handlers until one of them pronounces the command to be "complete".
Command context: recap
In the previous parts, we learned that the input to the pipeline is an
IDataCommandContext. To briefly recap, this object has the following properties:
- Session: Defines the context within which to execute the command
- Handler: Implements an abstraction for reading/writing values and flags to the objects (e.g.
SetValue(IMetaProperty)); more detail on this later
- Objects: The sequence of objects on which to operate (e.g. for save commands) or to return (e.g. for load commands)
- ExecutableQuery: The query to execute when loading or deleting objects
- MetaClass: The metadata that describes the root object in this command; more detail on this later as well
Where the pipeline metaphor holds up is that the command context will always start at the same end. The ordering of data handlers is intended to reduce the amount of work and time invested in processing a given command.
The first stage of processing is to quickly analyze the command to handle cases where there is nothing to do. For example,
- The command is to save or delete, but the sequence of
- The command is to save or reload, but none of the objects in the sequence of
- The command is to load data but the query restricts to a
nullvalue in the primary key or a foreign key that references a non-nullable, unique key.
It is useful to capture these checks in one or more analyzers for the following reasons,
- All drivers share a common implementation for efficiency checks
- Optimizations are applied independent of the data sources used
- Driver code focuses on driver-specifics rather than general optimization
If the analyzer hasn't categorically handled the command and the command is to load data, the next step is to check caches. For the purposes of this article, there are two things that affect how long data is cached:
- If the session is in a transacted state, then only immutable data, data that was loaded before the transaction began or data loaded within that transaction can be used. Data loaded/saved by other sessions -- possibly to global caches -- is not visible to a session in a transaction with an
- The metadata associated with the objects can include configuration settings that control maximum caching lifetime as well as an access-timeout. The default settings are good for general use but can be tweaked for specific object types.
Caches currently include the following standard handlers[^2]:
ValueListDataHandlerreturns immutable data. Since the data is immutable, it can be used independent of the transaction-state of the session in which the command is executed.
SessionCacheDataHandlerreturns data that's already been loaded or saved in this session, to avoid a call to a possibly high-latency back-end. This data is safe to use within the session with transactions because the cache is rolled back when a transaction is rolled back.
If the analyzer and cache haven't handled a command, then we're finally at a point where we can no longer avoid a call to a data source. Data sources can be internal or external.
The most common type is an external database:
- PostgreSql 8.x and higher (PostgreSql 9.x for schema migration)
- Sql Server 2008 and higher (w/schema migration)
- Mongo (no schema; no migration)
- SQlite (not yet released)
Another standard data source is the Quino remote application server, which provides a classic interface- and method-based service layer as well as mapping nearly the full power of Quino's generalized querying capabilities to an application server. That is, an application can smoothly switch between a direct connection to a database to using the remoting driver to call into a service layer instead.
The remoting driver supports both binary and JSON protocols. Further details are also beyond the scope of this article, but this driver has proven quite useful for scaling smaller client-heavy applications with a single database to thin clients talking to an application server.
And finally, there is another way to easily include "mini" data drivers in an application. Any metaclass can include an
IDataHandlerAspect that defines its own data driver as well as its capabilities. Most implementations use this technique to bind in immutable lists of data. But this technique has also been used to load/save data from/to external APIs, like REST services. We can take a look at some examples in more detail in another article.
The mini data driver created for use with an aspect can relatively easily be converted to a full-fledged data handler.
The last step in a command is what Quino calls "local evaluation". Essentially, if a command cannot be handled entirely within the rest of the data pipeline -- either entirely by an analyzer, one or more caches or the data source for that type of object -- then the local analyzer completes the command.
What does this mean? Any orderings or restrictions in a query that cannot be mapped to the data source (e.g. a C# lambda is too complex to map to SQL) are evaluated on the client rather than the server. Therefore, any query that can be formulated in Quino can also be evaluated fully by the data pipeline -- the question is only of how much of it can be executed on the server, where it would (usually) be more efficient to do so.
Please see the article series that starts with Optimizing data access for high-latency networks for specific examples.
In this article, we've learned a bit about the ways in which Quino retrieves and stores data using the data pipeline. In the next part, well cover the topic Builders & Commands.
[^1]: E.g. Quino uses a ProtoBuf-like protocol to communicate with its standard application server. [^2]: There is an open issue to Introduce a global cache for immutable objects or objects used not in a transaction.