22 posts tagged with "gcp"

View All Tags

Query Cloud SQL through Big Query

February 8, 2020 · 4 min read

Jeffrey Aven

Technologist and Cloud Consultant

cloudsql federated queries

This article demonstrates Cloud SQL federated queries for Big Query, a neat and simple to use feature.

Connecting to Cloud SQL

One of the challenges presented when using Cloud SQL on a private network (VPC) is providing access to users. There are several ways to accomplish this which include:

open the database port on the VPC Firewall (5432 for example for Postgres) and let users access the database using a command line or locally installed GUI tool (may not be allowed in your environment)
provide a web based interface deployed on your VPC such as PGAdmin deployed on a GCE instance or GKE pod (adds security and management overhead)
use the Cloud SQL proxy (requires additional software to be installed and configured)

In additional, all of the above solutions require direct IP connectivity to the instance which may not always be available. Furthermore each of these operations requires the user to present some form of authentication – in many cases the database user and password which then must be managed at an individual level.

Enter Cloud SQL federated queries for Big Query…

Big Query Federated Queries for Cloud SQL

Big Query allows you to query tables and views in Cloud SQL (currently MySQL and Postgres) using the Federated Queries feature. The queries could be authorized views in Big Query datasets for example.

This has the following advantages:

Allows users to authenticate and use the GCP console to query Cloud SQL
Does not require direct IP connectivity to the user or additional routes or firewall rules
Leverages Cloud IAM as the authorization mechanism – rather that unmanaged db user accounts and object level permissions
External queries can be executed against a read replica of the Cloud SQL instance to offload query IO from the master instance

Setting it up

Setting up big query federated queries for Cloud SQL is exceptionally straightforward, a summary of the steps are provided below:

Step 1. Enable a Public IP on the Cloud SQL instance

This sounds bad, but it isn’t really that bad. You need to enable a public interface for Big Query to be able to establish a connection to Cloud SQL, however this is not accessed through the actual public internet – rather it is accessed through the Google network using the back end of the front end if you will.

Furthermore, you configure an empty list of authorized networks which effectively shields the instance from the public network, this can be configured in Terraform as shown here:

This configuration change can be made to a running instance as well as during the initial provisioning of the instance.

As shown below you will get a warning dialog in the console saying that you have no authorized networks - this is by design.

Step 2. Create a Big Query dataset which will be used to execute the queries to Cloud SQL

Connections to Cloud SQL are defined in a Big Query dataset, this can also be use to control access to Cloud SQL using authorized views controlled by IAM roles.

Step 3. Create a connection to Cloud SQL

To create a connection to Cloud SQL from Big Query you must first enable the BigQuery Connection API, this is done at a project level.

As this is a fairly recent feature there isn't great coverage with either the bq tool or any of the Big Query client libraries to do this so we will need to use the console for now...

Under the Resources -> Add Data link in the left hand panel of the Big Query console UI, select Create Connection. You will see a side info panel with a form to enter connection details for your Cloud SQL instance.

In this example I will setup a connection to a Cloud SQL read replica instance I have created:

Creating a Big Query Connection to Cloud SQL

More information on the Big Query Connections API can be found at: https://cloud.google.com/bigquery/docs/reference/bigqueryconnection/rest

The following permissions are associated with connections in Big Query:

bigquery.connections.create  
bigquery.connections.get  
bigquery.connections.list  
bigquery.connections.use  
bigquery.connections.update  
bigquery.connections.delete

These permissions are conveniently combined into the following predefined roles:

roles/bigquery.connectionAdmin    (BigQuery Connection Admin)         
roles/bigquery.connectionUser     (BigQuery Connection User)

Step 4. Query away!

Now the connection to Cloud SQL can be accessed using the EXTERNAL_QUERY function in Big Query, as shown here:

if you have enjoyed this post, please consider buying me a coffee ☕ to help me keep writing!

Google Cloud SQL – Availability for PostgreSQL – Part II (Read Replicas)

January 24, 2020 · 5 min read

Jeffrey Aven

Technologist and Cloud Consultant

CloudSQL HA

In this post we will look at read replicas as an additional method to achieve multi zone availability for Cloud SQL, which gives us - in turn - the ability to offload (potentially expensive) IO operations such as user created backups or read operations without adding load to the master instance.

In the previous post in this series we looked at Regional availability for PostgreSQL HA using Cloud SQL:

Google Cloud SQL – Availability, Replication, Failover for PostgreSQL – Part I

Recall that this option was simple to implement and worked relatively seamlessly and transparently with respect to zonal failover.

Now let's look at read replicas in Cloud SQL as an additional measure for availability.

Deploying Read Replica(s)

Deploying read replicas is slightly more involved than simple regional (high) availability, as you will need to define each replica or replicas as a separate Cloud SQL instance which is a slave to the primary instance (the master instance).

An example using Terraform is provided here, starting by creating the master instance:

Next you would specify one or more read replicas (typically in a zone other than the zone the master is in):

Note that several of the options supplied are omitted when creating a read replica database instance, such as the backup and maintenance options - as these operations cannot be performed on a read replica as we will see later.

Cloud SQL Instances - showing master and replica

Voila! You have just set up a master instance (the primary instance your application and/or users will connect to) along with a read replica in a different zone which will be asynchronously updated as changes occur on the master instance.

Read Replicas in Action

Now that we have created a read replica, lets see it in action. After connecting to the read replica (like you would any other instance), attempt to access a table that has not been created on the master as shown here:

SELECT operation from the replica instance

Now create the table and insert some data on the master instance:

Create a table and insert a record on the master instance

Now try the select operation on the replica instance:

SELECT operation from the replica instance (after changes have been made on the master)

It works!

Some Points to Note about Cloud SQL Read Replicas

Users connect to a read replica as a normal database connection (as shown above)
Google managed backups (using the console or gcloud sql backups create .. ) can NOT be performed against replica instances
Read replicas can be used to offload IO intensive operations from the the master instance - such as user managed backup operations (e.g. pg_dump)

pg_dump operation against a replica instance

BE CAREFUL Despite their name, read replicas are NOT read only, updates can be made which will NOT propagate back to the master instance - you could get yourself in an awful mess if you allow users to perform INSERT, UPDATE, DELETE, CREATE or DROP operations against replica instances.

Promoting a Read Replica

If required a read replica can be promoted to a standalone Cloud SQL instance, which is another DR option. Keep in mind however as the read replica is updated in an asynchronous manner, promotion of a read replica may result in a loss of data (hopefully not much but a loss nonetheless). Your application RPO will dictate if this is acceptable or not.

Promotion of a read replica is reasonably straightforward as demonstrated here using the console:

Promoting a read replica using the console

You can also use the following gcloud command:

gcloud sql instances promote-replica <replica_instance_name>

Once you click on the Promote Replica button you will see the following warning:

Promoting a read replica using the console

This simply states that once you promote the replica instance your instance will become an independent instance with no further relationship with the master instance. Once accepted and the promotion process is complete, you can see that you now have two independent Cloud SQL instances (as advertised!):

Some of the options you would normally configure with a master instance would need to be configured on the promoted replica instance - such as high availability, maintenance and scheduled backups - but in the event of a zonal failure you would be back up and running with virtually no data loss!

Full source code for this article is available at: https://github.com/gamma-data/cloud-sql-postgres-availability-tutorial

if you have enjoyed this post, please consider buying me a coffee ☕ to help me keep writing!

Google Cloud SQL – Availability, Replication, Failover for PostgreSQL – Part I

January 17, 2020 · 5 min read

Jeffrey Aven

Technologist and Cloud Consultant

CloudSQL HA

In this multi part blog we will explore the features available in Google Cloud SQL for High Availability, Backup and Recovery, Replication and Failover and Security (at rest and in transit) for the PostgreSQL DBMS engine. Some of these features are relatively hot of the press and in Beta – which still makes them available for general use.

We will start by looking at the High Availability (HA) options available to you when using the PostgreSQL engine in Google Cloud SQL.

Most of you would be familiar with the concepts of High Availability, Redundancy, Fault Tolerance, etc but let’s start with a definition of HA anyway:

High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
Wikipedia

Higher than a normal period is quite subjective, typically this is quantified by a percentage represented by a number of “9s”, that is 99.99% (which would be quoted as “four nines”), this would allot you 52.60 minutes of downtime over a one-year period.

Essentially the number of 9’s required will drive your bias towards the options available to you for Cloud SQL HA.

We will start with Cloud SQL HA in its simplest form, Regional Availability.

Regional Availability

Knowing what we know about the Google Cloud Platform, regional availability means that our application or service (in this case Cloud SQL) should be resilient to a failure of any one zone in our region. In fact, as all GCP regions have at least 3 zones – two zones could fail, and our application would still be available.

Regional availability for Cloud SQL (which Google refers to as High Availability), creates a standby instance in addition to the primary instance and uses a regional Persistent Disk resource to store the database instance data, transaction log and other state files, which is synchronously replicated to a Persistent Disk resource local to the zones that the primary and standby instances are located in.

A shared IP address (like a Virtual IP) is used to serve traffic to the healthy (normally primary) Cloud SQL instance.

An overview of Cloud SQL HA is shown here:

Implementing High Availability for Cloud SQL

Implementing Regional Availability for Cloud SQL is dead simple, it is one argument:

availability_type = "REGIONAL"

Using the gcloud command line utility, this would be:

gcloud sql instances create postgresql-instance-1234 \
  --availability-type=REGIONAL \
  --database-version= POSTGRES_9_6

Using Terraform (with a complete set of options) it would look like:

resource "google_sql_database_instance" "postgres_ha" {
  provider = google-beta
  region = var.region
  project = var.project
  name = "postgresql-instance-${random_id.instance_suffix.hex}"
  database_version = "POSTGRES_9_6"
  settings {
   tier = var.tier
   disk_size = var.disk_size
   activation_policy = "ALWAYS"
   disk_autoresize = true
   disk_type = "PD_SSD"
   **availability_type = "REGIONAL"**
   backup_configuration {
     enabled = true
     start_time = "00:00"
   }
   ip_configuration  {
     ipv4_enabled = false
     private_network = google_compute_network.private_network.self_link
   }
   maintenance_window  {
     day = 7
     hour = 0
     update_track = "stable"
   }
  }
 } 

Once deployed you will notice a few different items in the console, first from the instance overview page you can see that the High Availability option is ENABLED for your instance.

Second, you will see a Failover button enabled on the detailed management view for this instance.

Failover

Failovers and failbacks can be initiated manually or automatically (should the primary be unresponsive). A manual failover can be invoked by executing the command:

gcloud sql instances failover postgresql-instance-1234

There is an --async option which will return immediately, invoking the failover operation asynchronously.

Failover can also be invoked from the Cloud Console using the Failover button shown previously. As an example I have created a connection to a regionally available Cloud SQL instance and started a command which runs a loop and prints out a counter:

Now using the gcloud command shown earlier, I have invoked a manual failover of the Cloud SQL instance.

Once the failover is initiated, the client connection is dropped (as the server is momentarily unavailable):

The connection can be immediately re-established afterwards, the state of the running query is lost - importantly no data is lost however. If your application clients had retry logic in their code and they weren't executing a long running query, chances are no one would notice! Once reconnecting normal database activities can be resumed:

A quick check of the instance logs will show that the failover event has occured:

Now when you return to the instance page in the console you will see a Failback button, which indicates that your instance is being served by the standby:

Note that there may be a slight delay in the availability of this option as the replica is still being synched.

It is worth noting that nothing comes for free! When you run in REGIONAL or High Availability mode - you are effectively paying double the costs as compared to running in ZONAL mode. However availability and cost have always been trade-offs against one another - you get what you pay for...

More information can be found at: https://cloud.google.com/sql/docs/postgres/high-availability

Next up we will look at read replicas (and their ability to be promoted) as another high availability alternative in Cloud SQL.

if you have enjoyed this post, please consider buying me a coffee ☕ to help me keep writing!

The Ultimate AWS to GCP Thesaurus

December 30, 2019 · 5 min read

Jeffrey Aven

Technologist and Cloud Consultant

aws to gcp thesauraus

There are many posts available which map analogous services between the different cloud providers, but this post attempts to go a step further and map additional concepts, terms, and configuration options to be the definitive thesaurus for cloud practitioners familiar with AWS looking to fast track their familiarisation with GCP.

It should be noted that AWS and GCP are fundamentally different platforms, nowhere is this more apparent than in the way networking is implemented between the two providers, see: GCP Networking for AWS Professionals

This post is focused on the core infrastructure, networking and security services offered by the two major cloud providers, I will do a future post on higher level services such as the ML/AI offerings from the respective providers.

Furthermore this will be a living post which I will continue to update, I encourage comments from readers on additional mappings which I will incorporate into the post as well.

I have broken this down into sections based upon the layout of the AWS Console.

Compute
Storage
Database
Networking & Content Delivery
Security, Identity, & Compliance

Compute


EC2 (Elastic Compute Cloud)	GCE (Google Compute Engine)
Availability Zone	Zone
Instance	VM Instance
Instance Family	Machine Family
Instance Type	Machine Type
Amazon Machine Image (AMI)	Image
IAM Role (for an EC2 Instance)	Service Account
Security Groups	VPC Firewall Rules (ALLOW)
Tag	Label
Termination Protection	Deletion Protection
Reserved Instances	Committed Use Discounts
Capacity Reservation	Reservation
User Data	Startup Script
Spot Instances	Preemptible VMs
Dedicated Instances	Sole Tenancy
EBS Volume	Persistent Disk
Auto Scaling Group	Managed Instance Group
Launch Configuration	Instance Template
ELB Listener	URL Map (Load Balancer)
ELB Target Group	Backend/ Instance Group
Instance Storage (ephemeral)	Local SSDs
EBS Snapshots	Snapshots
Keypair	SSH Keys
Elastic IP	External IP
Lambda	Google Cloud Functions
Elastic Beanstalk	Google App Engine
Elastic Container Registry (ECR)	Google Container Registry (GCR)
Elastic Container Service (ECS)	Google Kubernetes Engine (GKE)
Elastic Kubernetes Service (EKS)	Google Kubernetes Engine (GKE)
AWS Fargate	Cloud Run
AWS Service Quotas	Allocation Quotas
Account (within an Organisation)†	Project
Region	Region
AWS CloudFormation	Cloud Deployment Manager

Storage


Simple Storage Service (S3)	Google Cloud Storage (GCS)
Standard Storage Class	Standard Storage Class
Infrequent Access Storage Class	Nearline Storage Class
Amazon Glacier	Coldline Storage Class
Lifecycle Policy	Retention Policy
Tags	Labels
Snowball	Transfer Appliance
Requester Pays	Requester Pays
Region	Location Type/Location
Object Lock	Hold
Vault Lock (Glacier)	Bucket Lock
Multi Part Upload	Parallel Composite Transfer
Cross-Origin Resource Sharing (CORS)	Cross-Origin Resource Sharing (CORS)
Static Website Hosting	Bucket Website Configuration
S3 Access Points	VPC Service Controls
Object Notifications	Pub/Sub Notifications for Cloud Storage
Presigned URL	Signed URL
Transfer Acceleration	Storage Transfer Service
Elastic File System (EFS)	Cloud Filestore
AWS DataSync	Transfer Service for on-premises data
ETag	ETag
Bucket	Bucket
`aws s3`	`gsutil`

Database


Relational Database Service (RDS)	Cloud SQL
DynamoDB	Cloud Datastore
ElastiCache	Cloud Memorystore
Table (DynamoDB)	Kind (Cloud Datastore)
Item (DynamoDB)	Entity (Cloud Datastore)
Partition Key (DynamoDB)	Key (Cloud Datastore)
Attributes (DynamoDB)	Properties (Cloud Datastore)
Local Secondary Index (DynamoDB)	Composite Index (Cloud Datastore)
Elastic Map Reduce (EMR)	Cloud DataProc
Athena	Big Query
AWS Glue	Cloud DataFlow
Glue Catalog	Data Catalog
Amazon Simple Notification Service (SNS)	Cloud PubSub (push subscription)
Amazon Kinesis	Cloud PubSub
Amazon Simple Queue Service (SQS)	Cloud PubSub (poll and pull mode)

Networking & Content Delivery


Virtual Private Cloud (VPC) (Regional)	VPC Network (Global or Regional)
Subnet (Zonal)	Subnet (Regional)
Route Tables	Routes
Network ACLs (NACLS)	VPC Firewall Rules (ALLOW or DENY)
CloudFront	Cloud CDN
Route 53	Cloud DNS/Google Domains
Direct Connect	Dedicated (or Partner) Interconnect
Virtual Private Network (VPN)	Cloud VPN
AWS PrivateLink	Google Private Access
NAT Gateway	Cloud NAT
Elastic Load Balancer	Load Balancer
AWS WAF	Cloud Armour
VPC Peering Connection	VPC Network Peering
Amazon API Gateway	Apigee API Gateway
Amazon API Gateway	Cloud Endpoints

Security, Identity, & Compliance


Root Account	Super Admin
IAM User	Member
IAM Policy	Role (Collection of Permissions)
IAM Policy Attachment	IAM Role Binding (or IAM Binding)
Key Management Service (KMS)	Cloud KMS
CloudHSM	Cloud HSM
Amazon Inspector (agent based)	Cloud Security Scanner (scan based)
AWS Security Hub	Cloud Security Command Center (SCC)
Secrets Manager	Secret Manager
Amazon Macie	Cloud Data Loss Prevention (DLP)
AWS WAF	Cloud Armour
AWS Shield	Cloud Armour

† No direct equivalent, this is the closest equivalent

if you have enjoyed this post, please consider buying me a coffee ☕ to help me keep writing!

Google Cloud Storage Object Notifications using Slack

November 9, 2019 · 3 min read

Jeffrey Aven

Technologist and Cloud Consultant

This article describes the steps to integrate Slack with Google Cloud Functions to get notified about object events within a specified Google Cloud Storage bucket.

Events could include the creation of new objects, as well as delete, archive or metadata operations performed on a given bucket.

This pattern could be easily extended to other event sources supported by Cloud Functions including:

Cloud Pub/Sub messages
Cloud Firestore and Firebase events
Stackdriver log entries

More information can be found at https://cloud.google.com/functions/docs/concepts/events-triggers.

The prerequisite steps to configure Slack are provided here:

First you will need to create a Slack app (assuming you have already set up an account and a workspace). The following screenshots demonstrate this process:

Give the app a name and associate it with an existing Slack workspace

Next you need to Enable and Activate Incoming Webhooks to your app and add this to your workspace. The following screenshots demonstrate this process:

Next you need to specify a channel for notifications generated from object events.

Now you need to copy the Webhook url provided, you will use this later in your Cloud Function.

Treat your webhook url as a secret, do not upload this to a public source code repository

Next you need to create your Cloud Function, this example uses Python but you can use an alternative runtime including Node.js or Go.

This example templates the source code using the Terraform template_file data source. The function source code is shown here:

Within your Terraform code you need to render your Cloud Function code substituting the slack_webhook_url for it's value which you will supply as a Terraform variable. The rendered template file is then placed in a local directory along with a requirements.txt file and zipped up. The resulting Zip archive is uploaded to a specified bucket where it will be sourced to create the Cloud Function.

Now you need to create the Cloud Function, the following HCL snippet demonstrates this:

The event_trigger block in particular specifies which GCS bucket to watch and what events will trigger invocation of the function. Bucket events include:

google.storage.object.finalize (the creation of a new object)
google.storage.object.delete
google.storage.object.archive
google.storage.object.metadataUpdate

You could add additional logic to the Cloud Function code to look for specific object names or naming patterns, but keep in mind the function will fire upon every event matching the event_type and resource criteria.

To deploy the function, you would simply run:

terraform apply -var="slack_webhook_url=https://hooks.slack.com/services/XXXXXXXXX/XXXXXXXXX/XXXXXXXXXXXXXXXXXXXXXXXX"

Now once you upload a file named test-object.txt, voilà!:

Slack notification for a new object created

Full source code is available at: https://github.com/gamma-data/gcs-object-notifications-using-slack

if you have enjoyed this post, please consider buying me a coffee ☕ to help me keep writing!

Connecting to Cloud SQL​

Big Query Federated Queries for Cloud SQL​

Setting it up​

Step 1. Enable a Public IP on the Cloud SQL instance​

Step 2. Create a Big Query dataset which will be used to execute the queries to Cloud SQL​

Step 3. Create a connection to Cloud SQL​

Step 4. Query away!​

Deploying Read Replica(s)​

Read Replicas in Action​

Some Points to Note about Cloud SQL Read Replicas​

Promoting a Read Replica​

Regional Availability​

Implementing High Availability for Cloud SQL​

Failover​

Compute​

Storage​

Database​

Networking & Content Delivery​

Security, Identity, & Compliance​

Connecting to Cloud SQL

Big Query Federated Queries for Cloud SQL

Setting it up

Step 1. Enable a Public IP on the Cloud SQL instance

Step 2. Create a Big Query dataset which will be used to execute the queries to Cloud SQL

Step 3. Create a connection to Cloud SQL

Step 4. Query away!

Deploying Read Replica(s)

Read Replicas in Action

Some Points to Note about Cloud SQL Read Replicas

Promoting a Read Replica

Regional Availability

Implementing High Availability for Cloud SQL

Failover

Compute

Storage

Database

Networking & Content Delivery

Security, Identity, & Compliance