Getting started with Macrometa
  • Updated on 19 Jun 2019
  • 17 minutes to read
  • Contributors
  • Print
  • Share
  • Dark
    Light

Getting started with Macrometa

  • Print
  • Share
  • Dark
    Light

Overview

Welcome to the Macrometa Geo-distributed Fast Data Platform API documentation!

In this guide, we will introduce core concepts first and then cover how to

  • Use the web interface
  • Store and Query the data
  • Edit and remove existing data
  • Handle Streams and
  • Use RESTQL as backend as service

Core Concepts

Today’s applications are required to be highly responsive and always online. To achieve low latency and high availability, instances of these applications need to be deployed in datacenters that are close to their users. Applications need to respond in real time to large changes in usage at peak hours, store ever increasing volumes of data, and make this data available to users in milliseconds.

Macrometa Geo-distributed Fast Data Platform (C8) is a combination of

  1. Geo Distributed Database i.e., a multi-model, multi-master, global & geo-fenced real-time database

  2. Global streams i.e., global & geo-fenced streams to provide pub/sub, queuing and event processing.

  3. Compute for execution on the edge via RESTQL.

The platform is designed to sit across 100s of worldwide locations/pops and present one global multi-master real-time data (DB & Streams) platform.

Fabric is a collection of edge data centers linked together as a single, high performance cloud computing system consisting of storage, networking and processing functions. A fabric is created when a tenant account is provisioned with the edge locations. Each fabric contains Collections, Streams, Functions and Geo Fabrics.

Geo Fabrics are subsets of the fabric and can be composed of one or more edge locations in Fabric (defined by tenant). Different geofabrics are usually used to isolate the data inside them (collections, documents etc.) from one another. A geo fabric contains the following:

  • Collections - are a grouping of JSON documents and are like tables in a RDBMS. You can create any number of collections in a geo fabric. And a collection can have any number of documents.

  • Streams are a type of collection that capture data in motion. Streams support both pub-sub and queuing models. Messages are sent via streams by publishers to consumers who then do something with the message.

  • RESTQL: lets user define data management & processing as stored queries. Once deployed, the platform orchestrates the RESTQL to execute on demand (i.e. serverless) from any edge location in response to requests from clients. Your development agility comes from building systems composed of small, independent units of functionality focused on doing one thing well. RESTQL lets you build and deploy services at the level of a single function, not at the level of entire applications, containers, or VMs.

Note:
The default geofabric _system is special, because it cannot be removed. Users & permissions can managed in this geofabric. Their credentials are valid for all geofabrics under that tenant.

Real-time Database:

When your app polls for data, it becomes slow, unscalable, and cumbersome to maintain. C8 makes building realtime apps dramatically easier. C8 fabrics can push data to applications in realtime across multiple data centers. It dramatically reduces the time and effort necessary to build scalable realtime apps.

Accessing C8

C8 speaks HTTP / REST, but you can use the graphical web interface to keep it simple. If you are a developer, you might prefer the drivers over the GUI.

When you start using C8 in your project, you will likely use an official or community-made driver written in the same language as your project. Therefore, you can most certainly ignore the HTTP API unless you want to write a driver yourself or explicitly want to use the raw interface.

To get familiar with the C8 system you can even put drivers aside and use the web interface for basic interaction. You can access it in your browser at https://try.macrometa.io if not.

By default, authentication is enabled. The default user is root.

web-login-1

Next you will be asked which geofabric to use. Every tenant account comes with a default _system geofabric. Select this geofabric to continue.

web-login-2

You should then be presented the dashboard like this:

web-dashboard-1

Note
The pin icons on the map represent all the locations to which the data in this geo fabric will be replicated and converged automatically. Applications can read and write locally from those locations.

Data Models

C8 Geofabric supports multiple types of data models.

Key/Value model

The key/value store data model is the easiest to scale. In C8, this is implemented in the sense that a document collection always has a primary key _key attribute and in the absence of further secondary indexes the document collection behaves like a simple key/value store.

The only operations that are possible in this context are single key lookups and key/value pair insertions and updates. If _key is the only sharding attribute then the sharding is done with respect to the primary key and all these operations scale linearly.

If the sharding is done using different shard keys, then a lookup of a single key involves asking all shards and thus does not scale linearly.

Document model

The documents you can store in a regular collection closely follow the JSON format. A document contains zero or more attributes with each of these attributes having a value.

Documents are grouped into collections. A collection contains zero or more documents. If you are familiar with RDBMS, then it is safe to compare collections to tables, and documents to rows.

In a traditional RDBMS, you have to define columns before you can store records in a table. Collections are schema-less, and there is no need to define what attributes a document must have. Documents can have a completely different structure and still be stored together with other documents in a single collection.

In practice, there will be common denominators among the documents in a collection, but C8 itself doesn't force you to limit yourself to a certain data structure.

Graph model

You can turn your documents into graph structures for semantic queries with nodes, edges and properties to represent and store data. A key concept of the system is the idea of a graph, which directly relates data items in the database.

In SQL databases, you have the notion of a relation table to store n:m relationships between two data tables. An edge collection is somewhat similar to these relation tables; vertex collections resemble the data tables with the objects to connect.

While simple graph queries with fixed number of hops via the relation table may be doable in SQL with several nested joins, graph databases can handle an arbitrary number of these hops over edge collections.

Graph data models are particularly good at queries on graphs that involve paths in the graph of an a priori unknown length. For example, finding the shortest path between two vertices in a graph, or finding all paths that match a certain pattern starting at a given vertex are such examples.

Stream model

Streams are a type of collection in C8 that capture data-in-motion. Messages are sent via streams by publishers to consumers who then do something with the message. Streams can be created via client drivers (pyC8, jsC8), REST API or the web console.

Streams unifies queuing and pub-sub messaging into a unified messaging model that provides a lot of flexibility to users to consume messages in a way that is best for the use case at hand.

producer→stream→subscription→consumer

  • A stream is a named channel for sending messages. Each stream is backed by a distributed append-only log and can be local (at one edge location only) or global (across all edge locations in the Fabric).

Messages from publishers are only stored once on a stream, and can be consumed as many times as necessary by consumers. The stream is the source of truth for consumption. Although messages are only stored once on the stream, there can be different ways of consuming these messages.

Consumers are grouped together for consuming messages. Each group of consumers is a subscription on a stream. Each consumer group can have its own way of consuming the messages—exclusively, shared, or failover.

Creating Collections

GeoFabrics are sets of collections. Collections store records, which are referred to as documents. Collections are the equivalent of tables in RDBMS, and documents can be thought of as rows in a table. The difference is that you don't define what columns (or rather attributes) there will be in advance.

Every document in any collection can have arbitrary attribute keys and values. Documents in a single collection will likely have a similar structure in practice however, but the geofabric itself does not impose it and will operate stable and fast no matter how your data looks like.

For now, you can stick with the default _system geofabric and use the web interface to create collections and documents.

Start by clicking the COLLECTIONS menu entry, then the Add Collection tile. Give it a name, e.g. addresses, leave the other settings unchanged (we want it to be a document collection) and Save it. A new tile labeled addresses should show up, which you can click to open.

web-document-1

Note:
The collection addresses is automatically created in all locations where geo-fabric is available.

web-collection-1

Similarly you can create Edge Collections for graph data models. Start by clicking the COLLECTIONS menu entry, then the Add Collection tile. Give it a name, e.g. edges, leave the other settings unchanged (we want it to be a document collection) and Save it. A new tile labeled edges should show up, which you can click to open.

web-collections-3

Creating Documents

There will be No documents yet in the collection we created just now. Click the blue circle with the white plus on the right-hand side to create a first document in this collection. A dialog will ask you for a _key. You can leave the field blank and click Create to let the system assign an automatically generated (unique) key.

Note
The _key property is immutable, which means you can not change it once the document is created. What you can use as document key is described in the naming conventions.

Aside from a few system attributes, there is nothing in this document yet. Let's add a custom attribute by clicking the icon to the left of (empty object), then Append. Two input fields will become available, FIELD (attribute key) and VALUE (attribute value). Type firstname as key and your first name as value. Append another attribute, name it lastname and set it to your last name. Click Save to persist the changes.

web-document-2

If you click on Collection: addresses at the top on the right-hand side of the Macrometa logo, the document browser will show the documents in the addresses collection and you will see the document you just created in the list.

Similarly you can create documents in edge collections.

web-edgedocument-1

Aside from a few system attributes, there is nothing in this document yet. Let's add a custom attribute by clicking the icon to the left of (empty object), then Append. Two input fields will become available, FIELD (attribute key) and VALUE (attribute value). Type amount as key and add some amount as value. Append another attribute, name it status and set it to disputed. Click Save to persist the changes.

web-edgedocument-2

Click on Collection: edges at the top on the right-hand side of the Macrometa logo, the document browser will show the documents in the edges collection and you will see the document you just created in the list.

web-edgedocuments-3

Query Data

Time to retrieve our document using C8QL, C8 query language. We can directly look up the document we created via the _id, but there are also other options. Click the QUERIES menu entry to bring up the query editor and type the following (adjust the document ID to match your document):

RETURN DOCUMENT("addresses/1040471")

Then click Execute to run the query. The result appears below the query editor:

[
  {
    "_id": "addresses/1040471",
    "_key": "1040471",
    "_rev": "_YxnCusS--_",
    "email": "clark.kent@macrometa.co",
    "firstname": "Clark",
    "lastname": "Kent"
  }
]

web-queries-1

As you can see, the entire document including the system attributes is returned.

DOCUMENT() is a function to retrieve a single document or a list of documents of which you know the _keys or _ids. We return the result of the function call as our query result, which is our document inside of the result array. This type of query is called data access query. No data is created, changed or deleted.

Insert Data

Let's insert a document using a modification query:

INSERT {firstname: "Clark", lastname: "Kent", email: "clark@macrometa.co" }
INTO addresses

web-queries-2

The query is pretty self-explanatory: the INSERT keyword tells C8 that we want to insert a document with 3 attributes in this case. INTO is a mandatory part of every INSERT operation and is followed by the collection name that we want to store the document in. Note that there are no quote marks around the collection name.

If you run above query, there will be an empty array as result because we did not specify what to return using a RETURN keyword. It is optional in modification queries, but mandatory in data access queries. Even with RETURN, the return value can still be an empty array, e.g. if the specified document was not found.

Let's add another address, but return the newly created document this time:

INSERT {firstname: "John", lastname: "Doe", email: "john.doe@macrometa.co" }
INTO addresses
RETURN NEW

NEW is a pseudo-variable, which refers to the document created by INSERT.

What if we add more addresses and retrieve the list of addresses? We can formulate this with a FOR loop:

FOR address IN addresses
  RETURN address

It expresses to iterate over every document in addresses and to use address as variable name, which we can use to refer to the current user document.

web-queries-3

You may have noticed that the order of the returned documents is not necessarily the same as they were inserted. There is no order guaranteed unless you explicitly sort them. We can add a SORT operation very easily:

FOR address IN addresses
  SORT address.lastname DESC
  RETURN address

We might want to limit the result set to a subset of users, based on the firstname attribute for example. Let's return addresses whose firstname is Clark:

FOR address IN addresses
  FILTER address.firstname = "Clark"
  SORT address.lastname DESC
  RETURN address

Update Data

We can do a modification query as follows:

UPDATE "1040471" WITH { lastname: "Doe" } IN addresses
RETURN NEW

UPDATE allows to partially edit an existing document. There is also REPLACE, which would remove all attributes (except for _key and _id, which remain the same) and only add the specified ones. UPDATE on the other hand only replaces the specified attributes and keeps everything else as-is.

The UPDATE keyword is followed by the document key (or a document / object with a _key attribute) to identify what to modify. The attributes to update are written as object after the WITH keyword. IN denotes in which collection to perform this operation in, just like INTO (both keywords are actually interchangeable here). The full document with the changes applied is returned if we use the NEW pseudo-variable:

If we used REPLACE instead, the firstname and email attribute would be gone. With UPDATE, the attribute is kept (the same would apply to additional attributes if we had them).

Projections

Let us run our FILTER query again, but only return the first names this time:

FOR  address IN addresses
  FILTER address.lastname == "Clark"
  RETURN address.firstname

web-queries-4

This will return the firstnames of all users:

[
  "Clark ",
  "John",
]

It is called a projection if only a subset of attributes is returned. Another kind of projection is to change the structure of the results:

FOR address IN addresses
  RETURN { name: address.firstname, email: address.email }

The query defines the output format for every user document. The first name is returned as name instead of firstname, the email keeps the attribute key in this example.

It is also possible to compute new values:

FOR address IN addresses
  RETURN CONCAT(address.firstname, ". ", address.lastname)

CONCAT() is a function that can join elements together to a string.

You can also do cross products by using a loop inside a loop. For example, for every document in the addresses collection, iterate over all documents again and return firstname pairs, e.g. Clark & John.

FOR address1 IN addresses
  FOR address2 IN addresses
    FILTER address1.firstname != address2.firstname
    RETURN [address1.firstname, address2.firstname]

We could calculate the sum of zip codes and compute something new this way:

FOR address1 IN addresses
  FOR address2 IN addresses
    FILTER address1.firstname != address2.firstname
    RETURN {
        pair: [address1.firstname, address2.firstname]
        sumOfZip: address1.zip + address2.zip
    }

Remove Data

Finally, let's delete one of the user documents:

REMOVE "1040471" IN addresses

It deletes the user Clark (_key: "1040471"). We could also remove documents in a loop (same goes for INSERT, UPDATE and REPLACE):

FOR address IN addresses
    FILTER address.firsname == "Clark"
    REMOVE address IN addresses

The query deletes all address whose firstname is equal to Clark.

Graphs

Creating graphs in C8 is very easy. Let's create following two collections.

  • Document Collection -- persons
  • Edge Collection -- knows

In GUI, you should see the persons and knows collections under Collections:

web-graphs-0(1)

Now insert data into persons collection and knows edge collection as follows.

persons:
web-graphcollection-2

knows:
web-graphcollection-1

Now click on GRAPHS menu item and create a social graph from gui as shown below;

web-graphs-2b

Under GRAPHS panel, now you should see the newly created social graph.

web-graphs-1

Click on the social graph tile and you should see the graph like below. You can adjust various display parameters for the graph by clicking on the 3 horizontal line icon on the top right corner.

web-graphs-3

Note:
You can use C8QL to run various graph queries on this graph data model.

Streams

Streams are a type of collection in C8 that capture data-in-motion. Messages are sent via streams by publishers to consumers who then do something with the message.

C8 supports two types of persistent streams -

  • Local Streams - These streams are local to the region in which they were created.
  • Global Streams - These streams are geo-replicated across all regions of the fabric in which they were created.
Note
Every collection in C8 is also a local stream in each region the geofabric exists. So an app can subscribe to this stream and get all updates to a collection in real time.

Following image shows the list of all local streams corresponding to the collections available in that region.

web-streams-0

You can also create local & global streams that are purely data-in-motion. It is fairly simple. To create, click on STREAMS menu item and click on Add Stream. You will see a dialog box similar to as follows:

web-streams-1

You can see stats of any stream by clicking on the tile for that stream. Also you can open a subscriber console by clicking on Open in Console button in the popup box.

web-streams-2

RESTQL

C8 enables you to create geo-distributed backend as a service within few minutes for your application.

Following are the steps:

  1. Create collection(s).
  2. Create C8QL Queries. (Note: C8QL is a mix of JS + SQL)
  3. Save C8QL Queries. (Note: Automatically these queries are geo-distributed by C8).
  4. Trigger the saved queries via REST API in your App.

You can do steps 1-3 above via QUERIES menu item in GUI.
web-restql-0

Following are the available REST API for RESTQL. You can find these by clicking on the SUPPORT menu item.

web-restql-1(1)

You can execute RESTQL from your app via drivers or directly via curl as follows:

web-restql-3

RESTQL provides several advantages to an app developer including better performance, higher productivity, ease of use, increased scalability, maintainability and security.

Performance - RESTQL executes local to the data. This minimizes the use of slow networks, reduces network traffic, and improves round-trip response time. Applications benefit because result set processing eliminates network bottlenecks.

Productivity and Ease of Use - RESTQL increases your productivity significantly and enable you to build a geo-distributed backend in few minutes compared to weeks & months it would take to build a geo-distributed backend engine for your application.

Scalability: - RESTQLs are automatically geo-distributed to all locations available in the geo-fabric in which they are created. This allows your application to execute the backend logic locally in each location.

Maintainability: - Once a RESTQL is validated, it can be used with confidence in any number of applications. If its definition changes, only the RESTQL is affected, not the applications that call it. This simplifies maintenance and enhancement. Also, maintaining a RESTQL is easier than maintaining code in various client apps.

Security: - You can restrict access by allowing users to manipulate the data only through RESTQL queries that execute with their definer's privileges.

Your development agility comes from building systems composed of small, independent units of functionality focused on doing one thing well. RESTQL lets you build and deploy services in geo-distributed manner at the level of a single function enabiling you to build sophisticated backend engine for your app within few minutes.
Was this article helpful?