Skip to main content

3 posts tagged with "stackql"

View All Tags

· One min read
Jeffrey Aven

In the stackql project we needed an API to serve configuration file packages (stackql providers) to the stackql application at runtime.

Traditional artifact repositories or package managers were unsuitable as they were mainly designed for container images, JavaScript modules, Python packages etc. The artifacts, in this case, are signed tarball sets of OpenAPI specification documents (text files).

We have recently moved our provider registry (stackql-provider-registry) to use Deno Deploy as the serving layer (the API).

The code

The code is reasonably straightforward as shown here:

The deployment

We are using GitHub Actions to push assets and code to Deno Deploy, this was straightforward as well as you can see here:

if you have enjoyed this post, please consider buying me a coffee ☕ to help me keep writing!

· 3 min read
Yuncheng Yang

It is common to have a remote and dispersed team these days. As face to face meetings are less common and with geographically dispersed development teams not possible, it is challenging to have a clear picture of where your team is.

GitHub provides useful data to help us understand your development team's workload and progress. StackQL has an official GitHub provider which allows you to access this data using SQL.

info

StackQL is an open source project which enables you to query, analyze and interact with cloud and SaaS provider resources using SQL, see stackql.io

In this example we will use the pystackql Python package (Python wrapper for StackQL) along with a Jupyter Notebook to retrieve data from GitHub using SQL, then sink the data into a cloud native data warehouse for long term storage and analytics at scale, in this example we have used BigQuery.

Step by Step Guide

This guide will walk you through the steps involved in capturing and analyzing developer data using StackQL, Python, Jupyter and BigQuery.

1. Create GitHub Personal Access Token

You will need to create a Personal Access Token in GitHub for a user which has access to the org or orgs in GitHub you will be analyzing. Follow this guide to create your GitHub token and store it somewhere safe.

2. Setup your Jupyter Notebook

You need to set up your Jupyter environment, you can either use the Docker, see stackql/stackql-jupyter-demo or:

  1. Create your Jupyter project
  2. Download and install StackQL
  3. Clone the pystackql repo

3. Setup StackQL Authentication to GitHub

You can find instructions on how to use your personal access token to authenticate to GitHub here. The following example shows how to do this in a Jupyter notebook cell using pystackql.

4. Retrieve data

Next, we will use StackQL SQL queries to get commits, pull requests and pull request reviews, then we will aggregate by usernames of contributors. You can use JOIN semantics in StackQL to do this as well.

Get Contributors, Commits, Pull Requests and Reviews

In the following cell we will query data from GitHub using StackQL:

Aggregate Data By Username

Now we will aggregate the data by each contributor, see the following example:

5. Store the Data in BigQuery

After the transformation of data, we will then upload it to BigQuery. First, we will store the data as a new line delimited json file, making the uploading process much easier and handling the nested schema better, as shown in the following cell:

Now we can see the table on BigQuery as shown here:

BigQuery User Activity Table

From here you can use the same process to append data to the table and use BigQuery to perform analytics at scale on the data.

info

The complete notebook for this article can be accessed at FabioYyc/stackql-github-notebook-bq

· 2 min read
Jeffrey Aven

Open API specifications can get quite large, especially for providers with upwards of 500 routes or operations.

The challenge is to create standalone documents scoped by a service or path within the parent API specification and include only the components (schemas, responses, etc.) that pertain to operations included in the child document.

When I went looking for library or utility to do this, I couldn’t find one... so I have developed one myself.

It's a simple command (nodejs based but can be run in a bash terminal or from the Windows command line) which requires a few options, including:

  • the provider name (e.g. github)
  • a provider version which is a version you set - allowing you to make minor modifications to the output documents (e.g. v0.1.0)
  • a service discriminator which is a JSONPath expression to identify a service name within each route in the parent file, this is used to assign operations to services in separate documents (e.g. '$["x-github"].category')
  • an output directory (e.g. ./dev)

and of course, the openapi spec document you are splitting up.

an example is shown here:

openapi-doc-util split \
-n github \
-v v0.1.0 \
-s '$["x-github"].category' \
-o ./dev \
ref/github/api.github.com.yaml

Help for the command is available using openapi-doc-util split.

The net result is 59 self-contained, service scoped documents, containing only the components referenced by routes in the service document.

You can access this utility via NPMJS or via GitHub.

Splitting up a large open API spec document, is the first stage in developing a StackQL provider which we will discuss next time!

if you have enjoyed this post, please consider buying me a coffee ☕ to help me keep writing!