Overview
  • Updated on 28 May 2019
  • 18 minutes to read
  • Contributors
  • Print
  • Share
  • Dark
    Light

Overview

  • Print
  • Share
  • Dark
    Light

Overview

Welcome to the Macrometa Geo-distributed Fast Data Platform (C8) API documentation!

The platform is a combination of

  1. Geo Distributed Database i.e., a multi-model, multi-master, global & geo-fenced real-time database

  2. Global streams i.e., global & geo-fenced streams to provide pub/sub, queuing and event processing.

  3. Compute for execution on the edge via RESTQL.

The platform is designed to sit across 100s of worldwide locations/pops and present one global multi-master real-time data (DB & Streams) platform.

Fabric is a collection of edge data centers linked together as a single, high performance cloud computing system consisting of storage, networking and processing functions. A fabric is created when a tenant account is provisioned with the edge locations. Each fabric contains Collections, Streams, Functions and Geo Fabrics.

Geo Fabrics are subsets of the fabric and can be composed of one or more edge locations in Fabric (defined by tenant). A geo fabric contains the following services:

  • Collections - are a grouping of JSON documents and are like tables in a RDBMS. You can create any number of collections in a geo fabric. And a collection can have any number of documents.

  • Streams are a type of collection that capture data in motion. Streams support both pub-sub and queuing models. Messages are sent via streams by publishers to consumers who then do something with the message.

  • RESTQL: lets user define compute as RESTQL. Once deployed, the platform orchestrates the RESTQL to execute on demand (i.e. serverless) in edge locations in response to requests from clients.

Data Models

In C8, documents are grouped into collections. A collection contains zero or more documents.

Note:
If you are familiar with relational database management systems (RDBMS) then it is safe to compare collections to tables and documents to rows. The difference is that in a traditional RDBMS, you have to define columns before you can store records in a table. Such definitions are also known as schemas.

C8 is schema-less, which means that there is no need to define what attributes a document can have. Every single document can have a completely different structure and still be stored together with other documents in a single collection.

Note:
In practice, there will be common denominators among the documents in a collection, but the database system itself doesn't force you to limit yourself to a certain data structure.

There are two types of collections:

  • Document collections - Also refered to as vertex collections in the context of graphs.
  • Edge collections - These collections store documents as well, but they include two special attributes, _from and _to, which are used to create relations between documents.

Usually, two documents (vertices) stored in document collections are linked by a document (edge) stored in an edge collection. This is graph data model.

Note:
Graph data model follows the mathematical concept of a directed, labeled graph, except that edges don't just have labels, but are full-blown documents.

Collections exist inside of geofabrics. There can be one or many geofabrics. Different geofabrics are usually used for multi tenant setups, as the data inside them (collections, documents etc.) is isolated from one another.

Note:
The default geofabric _system is special, because it cannot be removed. Users & permissions can managed in this geofabric. Their credentials are valid for all geofabrics under that tenant.

C8 Geofabric supports multiple types of data models.

Key/Value model

The key/value store data model is the easiest to scale. In C8, this is implemented in the sense that a document collection always has a primary key _key attribute and in the absence of further secondary indexes the document collection behaves like a simple key/value store.

The only operations that are possible in this context are single key lookups and key/value pair insertions and updates. If _key is the only sharding attribute then the sharding is done with respect to the primary key and all these operations scale linearly.

If the sharding is done using different shard keys, then a lookup of a single key involves asking all shards and thus does not scale linearly.

Document model

The documents you can store in a regular collection closely follow the JSON format.

  • A document contains zero or more attributes with each of these attributes having a value. A value can either be an atomic type, i.e. number, string, boolean or null, or a compound type, i.e. an array or embedded document/object. Arrays and sub-objects can contain all of these types, which means that arbitrarily nested data structures can be represented in a single document.

  • Documents are grouped into collections. A collection contains zero or more documents. If you are familiar with RDBMS, then it is safe to compare collections to tables, and documents to rows.

  • In a traditional RDBMS, you have to define columns before you can store records in a table. Such definitions are also known as schemas. Collections are schema-less, and there is no need to define what attributes a document must have. Documents can have a completely different structure and still be stored together with other documents in a single collection.

  • In practice, there will be common denominators among the documents in a collection, but C8 itself doesn't force you to limit yourself to a certain data structure.

Graph model

You can turn your documents into graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the idea of a graph, which directly relates data items in the database.

  • A graph collection is simply a regular collection but with some special attributes that enable you to create graph queries and analyze the relationships between objects.

  • In SQL databases, you have the notion of a relation table to store n:m relationships between two data tables. An edge collection is somewhat similar to these relation tables; vertex collections resemble the data tables with the objects to connect.

  • While simple graph queries with fixed number of hops via the relation table may be doable in SQL with several nested joins, graph databases can handle an arbitrary number of these hops over edge collections. This is called traversal. Also edges in one edge collection may point to several vertex collections. It is common to have attributes attached to edges, i.e. a label naming this interconnection.

  • Edges have a direction, with their relations _from and _to pointing from one document to another document stored in vertex collections. In queries you can define in which directions the edge relations may be followed.

Graph databases are particularly good at queries on graphs that involve paths in the graph of an a priori unknown length. For example, finding the shortest path between two vertices in a graph, or finding all paths that match a certain pattern starting at a given vertex are such examples.

However, if the vertices and edges along the occurring paths are distributed across the cluster, then a lot of communication is necessary between nodes, and performance suffers.

To achieve good performance at scale, it is therefore necessary to get the distribution of the graph data across the shards in the cluster right. Most of the time, the application developers and users of C8 know best, how their graphs are structured.

Therefore, C8 allows users to specify, according to which attributes the graph data is sharded. A useful first step is usually to make sure that the edges originating at a vertex reside on the same cluster node as the vertex.

Stream model

Streams are a type of collection in C8 that capture data-in-motion. Messages are sent via streams by publishers to consumers who then do something with the message. Streams can be created via client drivers (pyC8), REST API or the web console.

Streams unifies queuing and pub-sub messaging into a unified messaging model that provides a lot of flexibility to users to consume messages in a way that is best for the use case at hand.

producer→stream→subscription→consumer

  • A stream is a named channel for sending messages. Each stream is backed by a distributed append-only log and can be local (at one edge location only) or global (across all edge locations in the Super Fabric). Similarly the streams can be persistent or non-persistent.

  • Messages from publishers are only stored once on a stream, and can be consumed as many times as necessary by consumers. The stream is the source of truth for consumption. Although messages are only stored once on the stream, there can be different ways of consuming these messages.

  • Consumers are grouped together for consuming messages. Each group of consumers is a subscription on a stream. Each consumer group can have its own way of consuming the messages—exclusively, shared, or failover.

Messages are the basic "unit" of streams in C8 platform. They're what producers publish to streams and what consumers then consume from streams (and acknowledge when the message has been processed).

Messages are the analogue of letters in a postal service system.

Component Purpose
Value / data payload The data carried by the message. All c8 stream messages carry raw bytes, although message data can also conform to data schemas in future
Key Messages can optionally be tagged with keys, which can be useful for things like stream compaction
Properties An optional key/value map of user-defined properties
Producer name The name of the producer that produced the message (producers are automatically given default names, but you can apply your own explicitly as well)
Sequence ID Each stream message belongs to an ordered sequence on its stream. A message's sequence ID is its ordering in that sequence.
Publish time The timestamp of when the message was published (automatically applied by the producer)
Event time An optional timestamp that applications can attach to the message representing when something happened, e.g. when the message was processed. The event time of a message is 0 if none is explicitly set.

RESTQL

RESTQL in C8 platform enables you to build and execute CRUD & Processing logic close to where the data resides. Any RESTQL is automatically available via secure REST API enabling you to consume events from web, streams and collections without repetitive boiler-plate coding.

Your development agility comes from building systems composed of small, independent units of functionality focused on doing one thing well. RESTQL lets you build and deploy services at the level of a single function, not at the level of entire applications, containers, or VMs.

What you can do with RESTQL :

  • Serverless application backends
  • Real-time data processing (ETL, Transformations, Aggregations etc)
  • Data Pipelines

Highlights:

  • Simplest way to run your code inside the GeoFabric.
  • No servers to provision, manage, patch or update
  • Automatic scaling, highly available and fault tolerant

Concepts

Sharding

C8 organizes its collection data within a datacenter in shards. Sharding allows to use multiple machines to run a cluster of C8 instances that together constitute a single database.

Shards are configured per collection so multiple shards of data form the collection as a whole. To determine in which shard the data is to be stored C8 performs a hash across the values. By default this hash is being created from _key.

The number of shards is fixed at 64. There is no option for user to configure the number of shards. Hashing can be done for another attribute. You can also specify multiple shardKeys.

Note:
If you change the shard keys from their default ["_key"], then finding a document in the collection by its primary key involves a request to every single shard. Also you can no longer prescribe the primary key value of a new document but must use the automatically generated one. This restriction comes from the fact that ensuring uniqueness of the primary key would be very inefficient if the user could specify the primary key.

On which node in a cluster a particular shard is kept is decided by the system. There is no option for users to configure an affinity based on certain shard keys.

Unique indexes (hash, skiplist, persistent) on sharded collections are only allowed if the fields used to determine the shard key are also included in the list of attribute paths for the index:

shardKeys indexKeys
a a ok
a b not ok
a a, b ok
a, b a not ok
a, b b not ok
a, b a, b ok
a, b a, b, c ok
a, b, c a, b not ok
a, b, c a, b, c ok

Replication

Replication within a GeoFabric region

Replication within a datacenter is synchronous. Synchronous replication generally stores a copy of a shard's data on another db storage node and keeps it in sync. Essentially, when storing data the cluster will wait for all replicas in the dc to write all the data before greenlighting the write operation to the client.

This will naturally increase the latency a bit, since one more network hop is needed for each write. However, it will enable the cluster to immediately fail over to a replica whenever an outage has been detected, without losing any committed data, and mostly without even signaling an error condition to the client.

The replication is organized such that every shard has a leader and r-1 followers, where r denoted the replication factor. The replicationFactor parameter is the total number of copies being kept, that is, it is one plus the number of followers. The platform maintains two copies of the data i.e., replicationFactor = 2.

For each shard, the platform determines automatically suitable leaders and followers within each datacenter. When requesting data of a shard, only the current leader will be asked whereas followers will only keep their copy in sync.

Replication across Datacenters

Replication across datacenters is done leveraging streams. The replication is asynchronous and causually ordered.

HTTP Request Handling

Protocol

The platform exposes its API via HTTPs, making the server accessible easily with a variety of clients and tools (e.g. browsers, curl, telnet). The communication is SSL-encrypted.

Platform uses the standard HTTP methods (e.g. *GET*, *POST*, *PUT*, *DELETE*) plus the *PATCH* method described in RFC 5789.

Most server APIs expect clients to send any payload data in JSON format. Details on the expected format and JSON attributes can be found in the documentation of the individual server methods.

Clients sending requests to C8 must use either HTTP 1.0 or HTTP 1.1. Other HTTP versions are not supported by C8 and any attempt to send a different HTTP version signature will result in the server responding with an HTTP 505 (HTTP version not supported) error.

Note:
The platform will always respond to client requests with HTTP 1.1. Clients should therefore support HTTP version 1.1.

Clients are required to include the *Content-Length* HTTP header with the correct content length in every request that can have a body (e.g. POST, PUT or PATCH) request. C8 will not process requests without a *Content-Length* header - thus chunked transfer encoding for POST-documents is not supported.

HTTP Keep-Alive

C8 supports HTTP keep-alive. If the client does not send a *Connection* header in its request, and the client uses HTTP version 1.1, C8 will assume the client wants to keep alive the connection.

If clients do not wish to use the keep-alive feature, they should explicitly indicate that by sending a Connection: Close HTTP header in the request.

C8 will close connections automatically for clients that send requests using HTTP 1.0, except if they send an *Connection: Keep-Alive* header.

Note:
Establishing TCP connections is expensive, since it takes several ping pongs between the communication parties. Therefore you should use connection keepalive to send several HTTP request over one TCP-connection.

Each request is treated independently by definition. You can use this feature to build up a so called connection pool with several established connections in your client application, and dynamically re-use one of those then idle connections for subsequent requests.

Error Handling

The following should be noted about how the platform handles client errors in its HTTP layer:

Client requests using an HTTP version signature different than HTTP/1.0 or HTTP/1.1 will get an HTTP 505 (HTTP version not supported) error in return.

C8 will reject client requests with a negative value in the Content-Length request header with HTTP 411 (Length Required). C8 does not support POST with transfer-encoding: chunked which forbids the Content-Length header above.

The maximum URL length accepted by the platform is 16K. Incoming requests with longer URLs will be rejected with an HTTP 414 (Request-URI too long) error.

If the client sends a Content-Length header with a value bigger than 0 for an HTTP GET, HEAD, or DELETE request, the platlform will process the request, but will write a warning to its log file.

When the client sends a Content-Length header that has a value that is lower than the actual size of the body sent, C8 will respond with HTTP 400 (Bad Request).

If clients send a Content-Length value bigger than the actual size of the body of the request,C8 will wait for about 90 seconds for the client to complete its request. If the client does not send the remaining body data within this time, C8 will close the connection. Clients should avoid sending such malformed requests as this will block one tcp connection, and may lead to a temporary file descriptor leak.

When clients send a body or a Content-Length value bigger than the maximum allowed value (512 MB), C8 will respond with HTTP 413 (Request Entity Too Large).

If the overall length of the HTTP headers a client sends for one request exceeds the maximum allowed size (1 MB), the server will fail with HTTP 431 (Request Header Fields Too Large).

If clients request an HTTP method that is not supported by the server, C8 will return with HTTP 405 (Method Not Allowed). The platform supports following HTTP methods:

  • GET
  • POST
  • PUT
  • DELETE
  • HEAD
  • PATCH
  • OPTIONS
Note:
Not all server actions allow using all of these HTTP methods. You should look up up the supported methods for each resource you intend to use.
Requests using any other HTTP method (such as for example CONNECT, TRACE etc.) will be rejected by platform as mentioned before.

Cross-Origin Resource Sharing (CORS) requests

The platform will automatically handle CORS requests as follows:

Preflight

When a browser is told to make a cross-origin request that includes explicit headers, credentials or uses HTTP methods other than GET or POST, it will first perform a so-called preflight request using the OPTIONS method.

C8 will respond to OPTIONS requests with an HTTP 200 status response with an empty body. Since preflight requests are not expected to include or even indicate the presence of authentication credentials even when they will be present in the actual request, C8 does not enforce authentication for OPTIONS requests even when authentication is enabled.

C8 will set the following headers in the response:

  • access-control-allow-credentials: will be set to false by default. For details on when it will be set to true see the next section on cookies.

  • access-control-allow-headers: will be set to the exact value of the request's access-control-request-headers header or omitted if no such header was sent in the request.

  • access-control-allow-methods: will be set to a list of all supported HTTP headers regardless of the target endpoint. In other words that a method is listed in this header does not guarantee that it will be supported by the endpoint in the actual request.

  • access-control-allow-origin: will be set to the exact value of the request's origin header.

  • access-control-expose-headers: will be set to a list of response headers used by the C8 HTTP API.

  • access-control-max-age: will be set to an implementation-specific value.

Actual request

If a request using any other HTTP method than OPTIONS includes an origin header, C8 will add the following headers to the response:

  • access-control-allow-credentials: will be set to false by default. For details on when it will be set to true see the next section on cookies.

  • access-control-allow-origin: will be set to the exact value of the request's origin header.

  • access-control-expose-headers: will be set to a list of response headers used by the C8 HTTP API.

Cookies and Authentication

In order for the client to be allowed to correctly provide authentication credentials or handle cookies, C8 needs to set the access-control-allow-credentials response header to true instead of false. C8 will automatically set this header to true if the value of the request's origin header matches a trusted origin in the http.trusted-origin configuration option.

Note that browsers will not actually include credentials or cookies in cross-origin requests unless explicitly told to do so:

  • When using the Fetch API you need to set the
    credentials option to include.

    fetch("./", { credentials:"include" }).then(/* … */)
    
  • When using XMLHttpRequest you need to set the withCredentials option to true.

    var xhr = new XMLHttpRequest();
    xhr.open('GET', 'https://example.com/', true);
    xhr.withCredentials = true;
    xhr.send(null);
    
  • When using jQuery you need to set the xhrFields option:

    $.ajax({
       url: 'https://example.com',
       xhrFields: {
          withCredentials: true
       }
    });
    

Load-balancer Support

C8 exposes some APIs which store request state data on specific coordinator nodes, and thus subsequent requests which require access to this state must be served by the coordinator node which owns this state data.

In order to support function behind a load-balancer, C8 can transparently forward requests within the cluster to the correct node. If a request is forwarded, the response will contain the following custom HTTP header whose value will be the ID of the node which actually answered the request: *x-c8-request-forwarded-to*

The following APIs may use request forwarding:

  • /_tenant/{tenant-name}/_fabric/{fabricname}/cursor
Was this article helpful?