Managing query performance speed and low execution cost can be challenging, primarily when operating on large tables (100T+) with a dozen to hundreds of repeated fields.

Repeated fields are an elegant way of representing and accessing denormalized data without expensive joins, typical for the relational schema. While they perform great, BigQuery does not provide a partitioning or clustering option for the repeated field, which results in potentially costly a full data scan. Take a table with a repeated x1 column as an example; assume that table in question stores 1TB in x1 column; the query would use complete 1TB regardless…

Adrian Witas

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store