Getting Started with Google Cloud Platform: A Beginner’s Guide to Cloud Excellence

3.74K 0 0 0 0

📘 Chapter 5: Data Analytics and Machine Learning Tools in GCP

🔍 Overview

Google Cloud Platform (GCP) offers powerful, scalable tools for data analytics and machine learning (ML). Whether you're building dashboards, crunching massive datasets, or training neural networks, GCP provides services that are serverless, cost-effective, and tightly integrated with Google’s ecosystem.

In this chapter, we’ll cover:

  • Core services for data analytics: BigQuery, Cloud Storage, Dataflow, Pub/Sub
  • ML tools: Vertex AI, AutoML, TPUs, and AI APIs
  • Real-world examples and use cases
  • Code snippets and visual summaries

📊 1. BigQuery: Serverless Data Warehouse

🔹 What is BigQuery?

BigQuery is a fully managed, serverless, highly scalable data warehouse designed for fast SQL analytics on large datasets.

🔹 Features

  • ANSI SQL support
  • Real-time streaming inserts
  • Automatic scaling and partitioning
  • Integration with Looker Studio, Sheets, Cloud ML

🔹 Example: Create a Dataset

sql

 

-- Create dataset

CREATE SCHEMA my_dataset;

🔹 Example: Run SQL Query

sql

 

SELECT name, COUNT(*) as total_sales

FROM `project.dataset.sales`

GROUP BY name

ORDER BY total_sales DESC;

Feature

Benefit

Serverless

No infrastructure to manage

Fast Query Engine

Columnar, distributed processing

Pay-per-query pricing

Only pay for processed bytes

Built-in ML

Train ML models via SQL


🗃️ 2. Cloud Storage: Data Lake Foundation

Used for storing structured, semi-structured, and unstructured data. Serves as the data lake layer in modern architectures.

🔹 Use Case:

  • Store raw CSV, JSON, Parquet files
  • Stream logs or ETL inputs for processing

🔹 Example: Upload File

bash

 

gsutil cp localfile.csv gs://my-bucket/dataset/


🔄 3. Dataflow: Real-time and Batch Processing (Apache Beam)

Dataflow is a serverless streaming and batch data processing service built on Apache Beam.

🔹 Features

  • Unified batch and stream pipelines
  • Auto-scaling and load balancing
  • Support for Python and Java SDKs

🔹 Example Use Case:

  • Cleanse and transform streaming sensor data into BigQuery

🔹 Example Python Snippet:

python

 

with beam.Pipeline() as pipeline:

    rows = (

        pipeline

        | 'Read CSV' >> beam.io.ReadFromText('gs://my-bucket/data.csv')

        | 'Parse' >> beam.Map(lambda row: row.split(','))

        | 'Write to BQ' >> beam.io.WriteToBigQuery(

            'my_project:my_dataset.my_table',

            schema='field1:STRING, field2:INTEGER',

            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)

    )


📬 4. Pub/Sub: Messaging for Analytics Pipelines

Pub/Sub is a messaging middleware used to ingest events/data into analytics systems like Dataflow, BigQuery, and Dataproc.

🔹 Use Case:

  • Ingest IoT data
  • Capture app logs
  • Real-time dashboard updates

🔹 Example:

bash

 

# Create topic

gcloud pubsub topics create sales-stream

 

# Publish a message

gcloud pubsub topics publish sales-stream --message="sale:1234"


🧠 5. Vertex AI: Unified ML Platform

Vertex AI is the all-in-one solution for training, tuning, deploying, and monitoring ML models on GCP.

🔹 Features:

  • Jupyter-based Workbench
  • Support for AutoML + custom models
  • Model registry + explainable AI
  • Scalable GPU/TPU training

Tool

Use For

Vertex AI Workbench

Managed notebooks for experimentation

Vertex AI Training

Train on GPUs, TPUs

Vertex AI Pipelines

MLOps workflow automation

Vertex AI Prediction

Online or batch model serving


🔹 Example: Train a Model with AutoML

bash

 

gcloud beta ai custom-jobs create \

  --region=us-central1 \

  --display-name="my_model_training" \

  --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=gcr.io/cloud-aiplatform/training/tf-cpu.2-3:latest


🔹 Vertex AI Workbench

Interactive JupyterLab instances with BigQuery, GitHub, TensorFlow, and other integrations pre-installed.

Common libraries:

python

 

import pandas as pd

from google.cloud import bigquery


🧠 6. BigQuery ML: Machine Learning with SQL

Use SQL to train ML models directly inside BigQuery—no code required.

🔹 Example: Train Linear Regression Model

sql

 

CREATE OR REPLACE MODEL `my_dataset.sales_model`

OPTIONS(model_type='linear_reg') AS

SELECT

  feature_1,

  feature_2,

  label

FROM

  `my_dataset.sales_data`;

🔹 Predict:

sql

 

SELECT

  feature_1,

  predicted_label

FROM

  ML.PREDICT(MODEL `my_dataset.sales_model`,

             (SELECT feature_1, feature_2 FROM `my_dataset.new_data`));


🧪 7. AI APIs (Pre-trained ML Models)

GCP offers ready-to-use APIs for computer vision, natural language, translation, and speech.

API

Use Case

Endpoint

Vision AI

Image label/object detection

vision.googleapis.com

Natural Language

Sentiment analysis, syntax

language.googleapis.com

Translation API

Real-time translation

translate.googleapis.com

Speech-to-Text

Transcribe audio to text

speech.googleapis.com

🔹 Example: Python Call to Vision API

python

 

from google.cloud import vision

client = vision.ImageAnnotatorClient()

 

image = vision.Image()

image.source.image_uri = 'gs://my-bucket/cat.jpg'

 

response = client.label_detection(image=image)

for label in response.label_annotations:

    print(label.description)


📈 8. Looker Studio + BigQuery for Dashboards

  • Connect Looker Studio to BigQuery datasets
  • Build interactive charts with real-time analytics
  • Share insights with teams and clients securely

📋 Summary Table – GCP Analytics & ML Stack


Tool/Service

Purpose

Language

BigQuery

Data warehouse & SQL queries

SQL

Cloud Storage

Data lake layer

N/A

Dataflow

ETL, batch & stream pipelines

Python, Java

Pub/Sub

Event ingestion & real-time data

N/A

Vertex AI

ML model training and deployment

Python

BigQuery ML

SQL-based model training

SQL

AI APIs

Pretrained models for vision, NLP, etc.

Python

Back

FAQs


❓1. What is Google Cloud Platform (GCP)?

Answer:
GCP is Google’s suite of cloud computing services that provides infrastructure, platform, and serverless environments to build, deploy, and scale applications using the same technology that powers Google Search, YouTube, and Gmail.

❓2. Is Google Cloud free to use?

Answer:
Yes. GCP offers a $300 free credit for 90 days for new users and an Always Free Tier for services like Cloud Storage, BigQuery, and Compute Engine (1 f1-micro instance in select regions).

❓3. How do I start using GCP?

Answer:
To get started, create a Google Cloud account at cloud.google.com, set up your first project, enable billing, and explore the Console or use the gcloud CLI for resource management.

❓4. What’s the difference between Compute Engine and App Engine?

Answer:

  • Compute Engine gives you full control over virtual machines (IaaS).
  • App Engine is a fully managed PaaS that handles infrastructure, scaling, and deployments automatically.

❓5. What is a GCP project?

Answer:
A GCP project is a container for resources like VMs, buckets, APIs, and billing. It isolates services and permissions and helps organize workloads across environments.

❓6. Which programming languages are supported by GCP?

Answer:
GCP supports many languages including Python, Java, Go, Node.js, Ruby, PHP, C#, and .NET, depending on the service used (App Engine, Cloud Functions, Cloud Run, etc.).

❓7. What tools are used to manage GCP?

Answer:
You can manage GCP via:

  • Google Cloud Console (UI)
  • Cloud Shell (browser-based CLI)
  • gcloud CLI
  • REST APIs
  • Terraform and Deployment Manager for infrastructure as code

❓8. What is BigQuery used for?

Answer:
BigQuery is a serverless data warehouse that allows you to store and analyze large datasets using SQL. It’s ideal for data analytics, reporting, and business intelligence.

❓9. Is GCP good for hosting websites?

Answer:
Yes. GCP offers multiple options to host websites:

  • Static websites via Cloud Storage + CDN (Cloud CDN)
  • Dynamic web apps using App Engine or Cloud Run
  • Custom VMs via Compute Engine

❓10. Does GCP offer certifications?

Answer:
Yes. Google Cloud offers certifications like:

  • Cloud Digital Leader (beginner)
  • Associate Cloud Engineer
  • Professional Cloud Architect
  • Data Engineer, DevOps Engineer, and more, to validate your cloud skills.