Introduction
  • Updated on 12 Jun 2019
  • 11 minutes to read
  • Contributors
  • Print
  • Share
  • Dark
    Light

Introduction

  • Print
  • Share
  • Dark
    Light

Overview

Welcome to the Macrometa Geo-distributed Fast Data Platform API documentation!

Today’s applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in datacenters that are close to their users. Applications need to respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds.

Macrometa Geo-distributed Fast Data Platform (C8) is a combination of

  1. Geo Distributed Database i.e., a multi-model, multi-master, global & geo-fenced real-time database

  2. Global streams i.e., global & geo-fenced streams to provide pub/sub, queuing and event processing.

  3. Compute for execution on the edge via RESTQL.

The platform is designed to sit across 100s of worldwide locations/pops and present one global multi-master real-time data (DB & Streams) platform.

Fabric is a collection of edge data centers linked together as a single, high performance cloud computing system consisting of storage, networking and processing functions. A fabric is created when a tenant account is provisioned with the edge locations. Each fabric contains Collections, Streams, Functions and Geo Fabrics.

Geo Fabrics are subsets of the fabric and can be composed of one or more edge locations in Fabric (defined by tenant). A geo fabric contains the following services:

  • Collections - are a grouping of JSON documents and are like tables in a RDBMS. You can create any number of collections in a geo fabric. And a collection can have any number of documents.

  • Streams are a type of collection that capture data in motion. Streams support both pub-sub and queuing models. Messages are sent via streams by publishers to consumers who then do something with the message.

  • RESTQL: lets user define compute as RESTQL. Once deployed, the platform orchestrates the RESTQL to execute on demand (i.e. serverless) in edge locations in response to requests from clients.

Data Models

In C8, documents are grouped into collections. A collection contains zero or more documents.

Note:
If you are familiar with relational database management systems (RDBMS) then it is safe to compare collections to tables and documents to rows. The difference is that in a traditional RDBMS, you have to define columns before you can store records in a table. Such definitions are also known as schemas.

C8 is schema-less, which means that there is no need to define what attributes a document can have. Every single document can have a completely different structure and still be stored together with other documents in a single collection.

Note:
In practice, there will be common denominators among the documents in a collection, but the database system itself doesn't force you to limit yourself to a certain data structure.

There are two types of collections:

  • Document collections - Also refered to as vertex collections in the context of graphs.
  • Edge collections - These collections store documents as well, but they include two special attributes, _from and _to, which are used to create relations between documents.

Usually, two documents (vertices) stored in document collections are linked by a document (edge) stored in an edge collection. This is graph data model.

Note:
Graph data model follows the mathematical concept of a directed, labeled graph, except that edges don't just have labels, but are full-blown documents.

Collections exist inside of geofabrics. There can be one or many geofabrics. Different geofabrics are usually used for multi tenant setups, as the data inside them (collections, documents etc.) is isolated from one another.

Note:
The default geofabric _system is special, because it cannot be removed. Users & permissions can managed in this geofabric. Their credentials are valid for all geofabrics under that tenant.

C8 Geofabric supports multiple types of data models.

Key/Value model

The key/value store data model is the easiest to scale. In C8, this is implemented in the sense that a document collection always has a primary key _key attribute and in the absence of further secondary indexes the document collection behaves like a simple key/value store.

The only operations that are possible in this context are single key lookups and key/value pair insertions and updates. If _key is the only sharding attribute then the sharding is done with respect to the primary key and all these operations scale linearly.

If the sharding is done using different shard keys, then a lookup of a single key involves asking all shards and thus does not scale linearly.

Document model

The documents you can store in a regular collection closely follow the JSON format.

  • A document contains zero or more attributes with each of these attributes having a value. A value can either be an atomic type, i.e. number, string, boolean or null, or a compound type, i.e. an array or embedded document/object. Arrays and sub-objects can contain all of these types, which means that arbitrarily nested data structures can be represented in a single document.

  • Documents are grouped into collections. A collection contains zero or more documents. If you are familiar with RDBMS, then it is safe to compare collections to tables, and documents to rows.

  • In a traditional RDBMS, you have to define columns before you can store records in a table. Such definitions are also known as schemas. Collections are schema-less, and there is no need to define what attributes a document must have. Documents can have a completely different structure and still be stored together with other documents in a single collection.

  • In practice, there will be common denominators among the documents in a collection, but C8 itself doesn't force you to limit yourself to a certain data structure.

Graph model

You can turn your documents into graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the idea of a graph, which directly relates data items in the database.

  • A graph collection is simply a regular collection but with some special attributes that enable you to create graph queries and analyze the relationships between objects.

  • In SQL databases, you have the notion of a relation table to store n:m relationships between two data tables. An edge collection is somewhat similar to these relation tables; vertex collections resemble the data tables with the objects to connect.

  • While simple graph queries with fixed number of hops via the relation table may be doable in SQL with several nested joins, graph databases can handle an arbitrary number of these hops over edge collections. This is called traversal. Also edges in one edge collection may point to several vertex collections. It is common to have attributes attached to edges, i.e. a label naming this interconnection.

  • Edges have a direction, with their relations _from and _to pointing from one document to another document stored in vertex collections. In queries you can define in which directions the edge relations may be followed.

Graph databases are particularly good at queries on graphs that involve paths in the graph of an a priori unknown length. For example, finding the shortest path between two vertices in a graph, or finding all paths that match a certain pattern starting at a given vertex are such examples.

However, if the vertices and edges along the occurring paths are distributed across the cluster, then a lot of communication is necessary between nodes, and performance suffers.

To achieve good performance at scale, it is therefore necessary to get the distribution of the graph data across the shards in the cluster right. Most of the time, the application developers and users of C8 know best, how their graphs are structured.

Therefore, C8 allows users to specify, according to which attributes the graph data is sharded. A useful first step is usually to make sure that the edges originating at a vertex reside on the same cluster node as the vertex.

Stream model

Streams are a type of collection in C8 that capture data-in-motion. Messages are sent via streams by publishers to consumers who then do something with the message. Streams can be created via client drivers (pyC8), REST API or the web console.

Streams unifies queuing and pub-sub messaging into a unified messaging model that provides a lot of flexibility to users to consume messages in a way that is best for the use case at hand.

producer→stream→subscription→consumer

  • A stream is a named channel for sending messages. Each stream is backed by a distributed append-only log and can be local (at one edge location only) or global (across all edge locations in the Super Fabric). Similarly the streams can be persistent or non-persistent.

  • Messages from publishers are only stored once on a stream, and can be consumed as many times as necessary by consumers. The stream is the source of truth for consumption. Although messages are only stored once on the stream, there can be different ways of consuming these messages.

  • Consumers are grouped together for consuming messages. Each group of consumers is a subscription on a stream. Each consumer group can have its own way of consuming the messages—exclusively, shared, or failover.

Real-time Database

When your app polls for data, it becomes slow, unscalable, and cumbersome to maintain. C8 makes building realtime apps dramatically easier. It is a great choice when your applications could benefit from realtime feeds to your data.

The query-response database access model works well on the web because it maps directly to HTTP’s request-response. However, modern applications require sending data directly to the client in realtime. Use cases that can benefit from C8 realtime push architecture include:

  • Collaborative web and mobile apps
  • Streaming analytics apps
  • Multiplayer games
  • Realtime marketplaces
  • Connected devices

For example, when a user changes the position of a button in a collaborative design app, the server has to notify other users that are simultaneously working on the same project. Web browsers support these use cases via WebSockets and long-lived HTTP connections, but adapting database systems to realtime needs still presents a huge engineering challenge.

C8 database designed specifically to push data to applications in realtime across multiple data centers. It dramatically reduces the time and effort necessary to build scalable realtime apps.

RESTQL

RESTQL in C8 platform enables you to build and execute CRUD & Processing logic close to where the data resides. Any RESTQL is automatically available via secure REST API enabling you to consume events from web, streams and collections without repetitive boiler-plate coding.

Your development agility comes from building systems composed of small, independent units of functionality focused on doing one thing well. RESTQL lets you build and deploy services at the level of a single function, not at the level of entire applications, containers, or VMs.

What you can do with RESTQL :

  • Serverless application backends
  • Real-time data processing (ETL, Transformations, Aggregations etc)
  • Data Pipelines

Highlights:

  • Simplest way to run your code inside the GeoFabric.
  • No servers to provision, manage, patch or update
  • Automatic scaling, highly available and fault tolerant

Capability comparison

C8 fabric provides the best capabilities of traditional relational and non-relational databases and messaging systems.

Capabilities Relational databases NoSQL databases Streams C8 Geo Fabric
Global distribution No No No Yes
Horizontal scale No Yes Yes Yes
Latency guarantees No Yes Yes Bounded Latencies
High availability No Yes Yes Yes
Data model + API Relational (SQL) Multi-model + OSS API Messages Multi-model, C8QL, SQL (coming soon)
Real Time (push based) updates No No Yes Yes
Conflict Resolution No No N/A Yes
Geo Fencing No No No Yes
Geo Spatial support No No N/A Yes
Mulit-tenancy No No No Yes

Solutions that benefit from C8 fabric

Any web, mobile, gaming, finance, cdn, analytics and IoT applications that needs to handle massive amounts of data, reads, and writes at a scale with near-real response times for a variety of data will benefit from C8 fabric's high availability, high throughput, low latency, real-time converged database and streaming capabilities.

Next Steps

C8 is an edge native co-ordination free streaming and multi-model real-time database with flexible data models for documents, graphs and key-values. Build high performance applications using a convenient SQL-like query language or JavaScript extensions. Use ACID transactions if you require them. Scale horizontally within and across regions and vertically within a region with a few mouse clicks.

Key features include:

  • Flexible data modeling: model your data as combination of key-value pairs, documents or graphs which is perfect for social relations.
  • Powerful query language to retrieve and modify data.
  • Indexing: for various types of indexes - hash, skiplist, geo, persistent, full-text etc.
  • Graphs for treating relationships between data as important as data itself.
  • Transactions: run queries on multiple documents or collections with optional transactional consistency and isolation.
  • Replication and Sharding: spread bigger datasets across multiple servers with built-in replication.
  • [GeoFabric Streams]https://developer.document360.io/docs/overview-14): for low latency, high througput global pub-sub messaging and message queueing.
Was this article helpful?