An overview of Deployment architecture

Modified on Wed, 9 Apr at 3:36 AM

In this article we cover an overview of the platform's cloud native deployment architecture that incorporates best practices for high availability, scalability, data security and comprehensive networking strategy.

Deployment Architecture

The architecture presented is a multi-layered, cloud-native design that seamlessly integrates various services to create a resilient and scalable system. Let's break down each component to understand how they function together.

Core Infrastructure

At the heart of this design is a Virtual Private Cloud (VPC) that provides network isolation and security. Within this VPC lies a compute layer structured for maximum reliability:

Design ready for High availability and redundancy: Today, the system employs two separate availability zones (Zone A and Zone B) with relevant compute instances. This architecture is fundamentally ready for high availability and redundancy in the future ensuring that the application continues to function even if an entire zone experiences an outage.
1. Note: Each availability zone will have compute engine instances, creating redundancy which will eliminate single points of failure and allows for load balancing.
NAT Instance: The NAT instance in the public subnet allows the private instances to initiate outbound internet traffic and other managed services workloads leveraged by the Cubyts architecture.

User Access and Load Distribution

The architecture implements intelligent access management:

Cloud Load Balancer: Acts as the entry point for user traffic, distributing incoming requests evenly across available compute instances. This not only improves performance but also enhances stability during high-traffic periods.
Internet Connectivity: A secure connection to the internet allows for external service integration while maintaining security protocols.

Orchestration Layer

The orchestration layer provides automation and coordination capabilities:

Scheduler Component: Manages automated tasks, workflow orchestration, and scheduled activities, reducing manual intervention and ensuring consistent operations.

Data Management

The architecture employs a multi-database strategy to handle different data requirements:

SQL Database: Handles structured data that requires relational integrity and complex queries.
NoSQL Database: Provides document-oriented storage for semi-structured and unstructured data, offering flexibility and scalability.
Cloud Storage: Manages large files, backups, media assets, and other unstructured data that doesn't fit neatly into database paradigms.

Communication and Messaging

Modern applications require robust messaging capabilities:

Pub-Sub: Implements a publish-subscribe pattern for asynchronous communication between system components.
SMTP: Handles email communications for notifications, alerts, and user interactions.
Functions: Both subscription and email processes are handled by dedicated serverless functions, allowing for event-driven architecture and automatic scaling.

AI and ML Integration

The architecture embraces artificial intelligence capabilities:

Google Vertex AI: Provides machine learning capabilities for advanced analytics, predictions, and intelligent features without requiring extensive ML infrastructure management.

Monitoring and Observability

A comprehensive observability stack ensures system health:

Datadog Integration: Provides monitoring, logging, and observability tools to track system performance, identify issues before they impact users, and ensure compliance with service level objectives.

Key Benefits of this Architecture

This cloud-native architecture delivers several significant advantages:

High Availability: With multiple availability zones and redundant components, the system can withstand individual component failures without downtime.
Security: The VPC provides network isolation, with additional security measures possible at each layer.
Flexibility: The multi-database approach and serverless components allow for choosing the right tool for each specific requirement.
Future-Proof: AI integration capabilities prepare the system for advanced analytics and intelligent features.

Security considerations for the Deployment Architecture

This chapter outlines the security measures implemented in the deployment architecture. Our deployment architecture incorporates multiple security layers and controls to protect data and services at every level. The following sections detail the security measures implemented throughout each component of the architecture.

Core Infrastructure Security

At the heart of the design is a Virtual Private Cloud (VPC) that provides critical network isolation and security boundary controls:

Network Segmentation: The VPC establishes a clear security perimeter with controlled ingress/egress points, enabling comprehensive network traffic monitoring and filtering.
Multi-Zone Security Strategy: Two separate availability zones (Zone A and Zone B), are implemented not only for reliability but as an important security control that prevents single-zone security compromises from affecting the entire system.
Compute Instance Isolation: Each availability zone contains compute engine instances (for various workloads) which are completely isolated from the external world thanks to security groups and well configured access controls.

Perimeter Security and Access Controls

The architecture implements multiple layers of access controls to protect the system perimeter:

Cloud Load Balancer: We've implemented a secure load balancer as our first line of defense against attacks, with properly configured TLS & HTTP security headers. Our implementation includes request filtering, rate limiting, and anomaly detection to prevent common web attacks.
Internet Gateway Controls: All connections to the internet are secured through our comprehensive egress filtering, outbound connection monitoring, and appropriate security protocols to prevent data exfiltration and command-and-control attacks.

User Access and Load Distribution

The architecture implements intelligent access management:

Cloud Load Balancer: Acts as the entry point for user traffic, distributing incoming requests evenly across available compute instances. This not only improves performance but also enhances stability during high-traffic periods.
Internet Connectivity: A secure connection to the internet allows for external service integration while maintaining security protocols.

Orchestration Layer Security

The orchestration layer includes specific security controls to protect automated processes:

Scheduler Component Authentication: Strong authentication mechanisms and least privilege access controls for our scheduler component that manages automated tasks and workflows. All credentials are securely managed through our enterprise secrets management system.
Orchestration Audit Logging: Comprehensive logging of all orchestration activities is enabled, particularly for changes to scheduled tasks, with log data centrally collected and monitored for suspicious activities.

Data Security Controls

The architecture employs a multi-database strategy to handle different data requirements with specific security controls for each data store:

SQL Database Security: Structured data is protected with encryption at rest and in transit, robust authentication controls, comprehensive privilege management, prepared statements to prevent SQL injection, and continuous database activity monitoring. The database is fully secured behind the VPC.
NoSQL Database Security: NoSQL database is secured with authentication enabled by default, role-based access controls, network security controls to restrict access, and proper encryption for sensitive data fields. The database is fully secured behind the VPC.
Cloud Storage Security: All storage buckets/containers have properly configured access policies, sensitive data is encrypted, access credentials are securely managed, object lifecycle policies are in place, and access logging is enabled for all operations.

AI and ML Security Implementation

All AI components in the deployment architecture include specialized security measures to address unique challenges:

Google Vertex AI Data Protection: All sensitive data used for model training and inference is properly protected, with controls for data minimization, anonymization where appropriate, and secure transfer between components.
AI Ethics and Compliance: Our governance framework ensures AI outputs comply with regulatory requirements and ethical standards, particularly for decisions affecting users. Regular reviews ensure fair and unbiased operation.

Network Architecture

The Cubyts network architecture leverages the GCP Virtual Private Cloud (VPC) infrastructure to run the platform in a controlled, private and secure environment. The key characteristics of the architecture are as follows:

Availability zone: The entire infrastructure is on multiple availability zones within an GCP region.
Public subnets: The public subnets are routed using a route table that has a default route to the internet via an internet gateway.
The NAT instance in the public subnet allows the private instances to initiate outbound internet traffic and other managed services workloads leveraged by the Cubyts architecture.
The NAT instance is highly available in a single availability zone.
Private subnets for Cloud VM instances: The private subnet on multiple availability zones contains an infrastructure that is not directly accessible to the internet (this infrastructure has only private IPs). The resources in the private subnet (Cloud VM) gain access to the internet or other services using the NAT instance.
GCP Managed services: The Cubyts architecture leverages managed service resources like Pub/Sub, Cloud Function, Cloud SQL, etc, the scale and configuration from an availability zone is managed by GCP.
MongoDB Atlas Services: MongoDB atlas is our data lake where Cubyts stores the data it consumes from the products with which it integrates.
Weviate Vector DB: Weviate vectorization is used in the Cubyts pattern engine.
Gen AI end points: Cubyts prediction engine uses Gen AI end points (which are configured in the US-central region).

Note: All systems storing data are located in the South Asia region (except the Gen AI end-points as stated earlier).

Data flow

(Data flow)

User and Application Metadata Subsystem: All user and application metadata are stored in Cloud SQL (postgresql) which is inside the VPC. Data is irreversibly encrypted where applicable.
Data Loader Subsystem: The data loader subsystem is located inside the VPC. It fetches the tokens for the respective products from the Cloud SQL (postgresql) and extracts data from them. The extracted data is stored in the data lake subsystem.
Data Lake Subsystem: The data lake subsystem is based on MongoDB (Atlas) and stores all the data Cubyts extracts from the products, with which it integrates (Jira, Figma etc). The integration with the other subsystems located in GCP happens via VPC peering. The data here is massaged to create data points and expose them as raw and processed insights. For this, The Cubroid subsystem is used.
Cubroid Subsystem (Pattern engine): This is the AI/ML based subsystem which helps customers draw insights and make interventions based on the data. It makes use of the services offered by the Weviate vector database.
Products with which Cubyts integrates: There are two kinds of products with which Cubyts integrates:
1. The first group of products are the ones from where Cubyts extracts and stores data in the data lake - These are products like Jira, Figma, Freshdesk etc.
2. The second group of products are the ones where Cubyts extracts data on the fly to draw insights but does not store them anywhere. These are products like Github, BitBucket, Jenkins etc.
Data transfer between subsystems: Data transfer between subsystems are secured in three different ways depending on the context:
1. VPC Peering: There are cases where the subsystems are located inside different VPCs. In these cases, the connection is secured via a private VPC peering.
2. TLS: There are cases where data exchange between two subsystems happens via REST, in which case TLS is used to secure the data exchange.
3. Inter-subsystem encryption: There are cases where data exchange between two subsystems does not happen via REST and in such cases, the data is secured via encryption.

Information security policy

Cubyts is committed to safeguarding the data in its possession. Our customers and other stakeholders depend on us to protect their resources. We have taken the following measures and initiatives to make information security as a company wide initiative and a policy at Cubyts, the key aspects are as follows:

Information Security Officer (ISO): The Information Security Officer leads all the information-security efforts of the company. The Information Security Officer is responsible for identifying potential security threats and risks to the organisation, formulating and implementing policies and procedures to combat such threats, and monitoring the efficacy of existing systems.
Incident Management: We have a comprehensive Incident Management and Response Policy that outlines guidelines for reporting and responding to security incidents in an efficient manner.
Vulnerability Management: We have a well defined process of identifying, classifying, prioritising, mitigating, and remediating security vulnerabilities.
Data Classification: The classification of data as public data (that can be shared), Company confidential (that should not be disclosed), Customer confidential (which includes encrypted data in the Cubyts platform including PII data of our users), Personal data (all data that relates to an identified or identifiable individual or person) and its comprehensive management (technology and otherwise) is in place to manage the sanctity of data types.
Data backup:
1. The data in the platform is periodically backed-up with the ability to restore up to the last 1 hour state is part of the platform back-up policy.
2. Periodic drills enable us to ensure that the data is restored as quickly as possible to ensure minimal disruption of service.
The implementation of the status page ensures that our customers are aware of any downtime and maintenance schedules.
Data processing: As a GDPR and SOC compliant company, we have a comprehensive data processing policy (documented in our legal policies here).
Data encryption: We apply multiple levels of data encryptions to ensure sufficient protection of data managed by the Cubyts platform:
1. The database is encryption using state of the art algorithms offered as a service by GCP.
2. Important data in the database including PII data is further encrypted (second level of data security) using GCP Key management services and cryptography algorithms offered by GCP.
Endpoint security: Whilst the infrastructure is protected, the end point access to the infrastructure is equally protected to ensure protection against unauthorised access to systems and data, the following initiatives ensure endpoint security:
1. Endpoint hardware use antivirus software to protect themselves and critical systems from Malware attacks.
2. The hard disk of Endpoint hardware is encrypted e.g. using Filevault on Mac or disc encryption in Windows.
Staff adhere to centrally monitored strong password policy augmented by auto-screen-lock on their systems within a reasonable amount of inactive period.
In addition, data transfer from the end point devices is controlled as per the internal policies.

SSAE 18 SOC2 compliance

The Cubyts platform is SSAE 18 SOC 2 Type 1 compliant as of Jan 1st 2025. We are happy to share an abridged version of the report on Security, Confidentiality and Availability on demand.

Conclusion

Thank you for taking the time to delve into the Deployment architecture and associated security measures implemented by the Cubyts platform. Our commitment to leveraging AI and data to enhance SDLC experience underscores our dedication to improving the efficacy, efficiency and quality of product development for your teams. By unifying insights and streamlining execution, Cubyts not only addresses the complexities of modern product stacks but also empowers teams to make informed decisions and achieve better outcomes.

As we continue to innovate and evolve, we remain steadfast in our mission to provide a secure, scalable, and insightful platform that meets the dynamic needs of today's product leaders. We appreciate your interest and look forward to supporting your journey towards more effective and observable product development.

For any further information or to request an abridged version of our SSAE 18 SOC 2 report, please contact us at info@cubyts.com

About Cubyts

Cubyts is your AI Assistant for Proactive SDLC Governance, Cubyts integrates seamlessly with tools teams use (e.g. Jira, GitHub, Figma, Jenkins, etc.) to build their products; the platform enhances the development process with real-time insights, proactive flagging of potential issues, and data-driven decision-making, that ensures superlative Developer, Engineering and Execution excellence.