Welcome to the Macrometa Geo-distributed Fast Data Platform API documentation!
Today’s applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in datacenters that are close to their users. Applications need to respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds.
Macrometa Geo-distributed Fast Data Platform (C8) is a combination of
Geo Distributed Database i.e., a multi-model, multi-master, global & geo-fenced real-time database
Global streams i.e., global & geo-fenced streams to provide pub/sub, queuing and event processing.
Compute for execution on the edge via RESTQL.
The platform is designed to sit across 100s of worldwide locations/pops and present one global multi-master real-time data (DB & Streams) platform.
Fabric is a collection of edge data centers linked together as a single, high performance cloud computing system consisting of storage, networking and processing functions. A fabric is created when a tenant account is provisioned with the edge locations. Each fabric contains
Geo Fabrics are subsets of the fabric and can be composed of one or more edge locations in Fabric (defined by tenant). A geo fabric contains the following services:
Collections - are a grouping of JSON documents and are like tables in a RDBMS. You can create any number of collections in a geo fabric. And a collection can have any number of documents.
Streams are a type of collection that capture data in motion. Streams support both
queuingmodels. Messages are sent via streams by publishers to consumers who then do something with the message.
RESTQL: lets user define compute as RESTQL. Once deployed, the platform orchestrates the RESTQL to execute on demand (i.e. serverless) in edge locations in response to requests from clients.
In C8, documents are grouped into collections. A collection contains zero or more documents.
C8 is schema-less, which means that there is no need to define what attributes a document can have. Every single document can have a completely different structure and still be stored together with other documents in a single collection.
There are two types of collections:
- Document collections - Also refered to as vertex collections in the context of graphs.
- Edge collections - These collections store documents as well, but they include two special attributes, _from and _to, which are used to create relations between documents.
Usually, two documents (vertices) stored in document collections are linked by a document (edge) stored in an edge collection. This is graph data model.
Collections exist inside of geofabrics. There can be one or many geofabrics. Different geofabrics are usually used for multi tenant setups, as the data inside them (collections, documents etc.) is isolated from one another.
C8 Geofabric supports multiple types of data models.
The key/value store data model is the easiest to scale. In C8, this is implemented in the sense that a document collection always has a primary key
_key attribute and in the absence of further secondary indexes the document collection behaves like a simple key/value store.
The only operations that are possible in this context are single key lookups and key/value pair insertions and updates. If
_key is the only sharding attribute then the sharding is done with respect to the primary key and all these operations scale linearly.
If the sharding is done using different shard keys, then a lookup of a single key involves asking all shards and thus does not scale linearly.
The documents you can store in a regular collection closely follow the JSON format.
A document contains zero or more attributes with each of these attributes having a value. A value can either be an atomic type, i.e. number, string, boolean or null, or a compound type, i.e. an array or embedded document/object. Arrays and sub-objects can contain all of these types, which means that arbitrarily nested data structures can be represented in a single document.
Documents are grouped into collections. A collection contains zero or more documents. If you are familiar with RDBMS, then it is safe to compare collections to tables, and documents to rows.
In a traditional RDBMS, you have to define columns before you can store records in a table. Such definitions are also known as schemas. Collections are schema-less, and there is no need to define what attributes a document must have. Documents can have a completely different structure and still be stored together with other documents in a single collection.
In practice, there will be common denominators among the documents in a collection, but C8 itself doesn't force you to limit yourself to a certain data structure.
You can turn your documents into graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the idea of a graph, which directly relates data items in the database.
A graph collection is simply a regular collection but with some special attributes that enable you to create graph queries and analyze the relationships between objects.
In SQL databases, you have the notion of a relation table to store
n:mrelationships between two data tables. An edge collection is somewhat similar to these relation tables; vertex collections resemble the data tables with the objects to connect.
While simple graph queries with fixed number of hops via the relation table may be doable in SQL with several nested joins, graph databases can handle an arbitrary number of these hops over edge collections. This is called traversal. Also edges in one edge collection may point to several vertex collections. It is common to have attributes attached to edges, i.e. a label naming this interconnection.
Edges have a direction, with their relations
_topointing from one document to another document stored in vertex collections. In queries you can define in which directions the edge relations may be followed.
Graph databases are particularly good at queries on graphs that involve paths in the graph of an a priori unknown length. For example, finding the shortest path between two vertices in a graph, or finding all paths that match a certain pattern starting at a given vertex are such examples.
However, if the vertices and edges along the occurring paths are distributed across the cluster, then a lot of communication is necessary between nodes, and performance suffers.
To achieve good performance at scale, it is therefore necessary to get the distribution of the graph data across the shards in the cluster right. Most of the time, the application developers and users of C8 know best, how their graphs are structured.
Therefore, C8 allows users to specify, according to which attributes the graph data is sharded. A useful first step is usually to make sure that the edges originating at a vertex reside on the same cluster node as the vertex.
Streams are a type of collection in C8 that capture
data-in-motion. Messages are sent via streams by publishers to consumers who then do something with the message. Streams can be created via client drivers (pyC8), REST API or the web console.
Streams unifies queuing and pub-sub messaging into a unified messaging model that provides a lot of flexibility to users to consume messages in a way that is best for the use case at hand.
A stream is a named channel for sending messages. Each stream is backed by a distributed append-only log and can be local (at one edge location only) or global (across all edge locations in the Super Fabric). Similarly the streams can be persistent or non-persistent.
Messages from publishers are only stored once on a stream, and can be consumed as many times as necessary by consumers. The stream is the source of truth for consumption. Although messages are only stored once on the stream, there can be different ways of consuming these messages.
Consumers are grouped together for consuming messages. Each group of consumers is a subscription on a stream. Each consumer group can have its own way of consuming the messages—exclusively, shared, or failover.
When your app polls for data, it becomes slow, unscalable, and cumbersome to maintain. C8 makes building realtime apps dramatically easier. It is a great choice when your applications could benefit from realtime feeds to your data.
The query-response database access model works well on the web because it maps directly to HTTP’s request-response. However, modern applications require sending data directly to the client in realtime. Use cases that can benefit from C8 realtime push architecture include:
- Collaborative web and mobile apps
- Streaming analytics apps
- Multiplayer games
- Realtime marketplaces
- Connected devices
For example, when a user changes the position of a button in a collaborative design app, the server has to notify other users that are simultaneously working on the same project. Web browsers support these use cases via WebSockets and long-lived HTTP connections, but adapting database systems to realtime needs still presents a huge engineering challenge.
C8 database designed specifically to push data to applications in realtime across multiple data centers. It dramatically reduces the time and effort necessary to build scalable realtime apps.
RESTQL in C8 platform enables you to build and execute CRUD & Processing logic close to where the data resides. Any RESTQL is automatically available via secure REST API enabling you to consume events from web, streams and collections without repetitive boiler-plate coding.
Your development agility comes from building systems composed of small, independent units of functionality focused on doing one thing well. RESTQL lets you build and deploy services at the level of a single function, not at the level of entire applications, containers, or VMs.
What you can do with RESTQL :
- Serverless application backends
- Real-time data processing (ETL, Transformations, Aggregations etc)
- Data Pipelines
- Simplest way to run your code inside the GeoFabric.
- No servers to provision, manage, patch or update
- Automatic scaling, highly available and fault tolerant
C8 fabric provides the best capabilities of traditional relational and non-relational databases and messaging systems.
|Capabilities||Relational databases||NoSQL databases||Streams||C8 Geo Fabric|
|Latency guarantees||No||Yes||Yes||Bounded Latencies|
|Data model + API||Relational (SQL)||Multi-model + OSS API||Messages||Multi-model, C8QL, SQL (coming soon)|
|Real Time (push based) updates||No||No||Yes||Yes|
|Geo Spatial support||No||No||N/A||Yes|
Solutions that benefit from C8 fabric
web, mobile, gaming, finance, cdn, analytics and IoT applications that needs to handle massive amounts of data, reads, and writes at a scale with near-real response times for a variety of data will benefit from C8 fabric's high availability, high throughput, low latency, real-time converged database and streaming capabilities.
Key features include:
- Flexible data modeling: model your data as combination of key-value pairs, documents or graphs which is perfect for social relations.
- Powerful query language to retrieve and modify data.
- Indexing: for various types of indexes - hash, skiplist, geo, persistent, full-text etc.
- Graphs for treating relationships between data as important as data itself.
- Transactions: run queries on multiple documents or collections with optional transactional consistency and isolation.
- Replication and Sharding: spread bigger datasets across multiple servers with built-in replication.
- [GeoFabric Streams]https://developer.document360.io/docs/overview-14): for low latency, high througput global pub-sub messaging and message queueing.