1.) Pay-per-use. Goes without saying the ability to use large amounts of resources for a query without paying for idle resources makes this product much better than Redshift. The ability to specify higher tiers for queries which need a lot of memory/compute relative to storage scanned is also great. (This should be exposed in the UI!).
2.) Sharding on Custom Field. Besides the obvious benefits of insert time sharding, the ability to shard and cluster on any field is incredibly powerful. This unlocks a new category of use cases for internal enterprise apps. Although those apps are usually primarily OLAP based, they also need to do the occasional low-latency KV lookup without forcing a full column/table scan. Custom sharding enables that.
1.) Lack of schema modifications for streaming inserts. Currently BQ can modify schema dynamically (append columns or relax columns) for load jobs only. It would be great if it could do this for streaming inserts as well. This would unlock an entirely new set of use cases. Many companies stream event data into systems like Elastic or MongoDB because those systems gracefully handle schema drift. Many of those use cases would LOVE to switch to BigQuery if it could support schema drift.
2.) Autoschema detection needs improvement:
a.) It fails if fields are missing in the first 100 docs. There could be a (paid) option to do a full pass of the data before starting ingest. 3rd party github projects have been started and numerous stackoverflow pages validate this is a common conern.
b.) Fields with illegal characters (':', '@') cause auto schema detection to fail. There should be an option to remap illegal characters.
3.) Custom sharding restricted to timestamp fields -- When I shard on a non-timestamp field, I create "fake timestamps" by hashing that field. This is awkward and may be a non-obvious workaround for people who want to do a lookup without triggering a full column scan.
4.) Lack of Scheduled Queries. Numerous stack overflow pages ask about materialized views / scheduled queries on BQ. Users can easily work around with an external cronjob, but this could be burdensome for a class of users and seems like low hanging fruit to add.
5.) Small mismatches between BQ schema and Avro schema.
a.) BigQuery Schema can be specified in a JSON format which is very close to but not exactly the Avro JSON schema format. Why not add an option to specify the schema as an Avro schema? This would make it easier to integrate BQ to applications by leveraging the ecosystem of Avro tooling (e.g. code generators, avro reader/writers).
b.) Avro exports do not have a standalone schema file. This would be easy to generate and attach to the export and would make it easier to load bigquery exports. (For example, the exports could be loaded into Spanner which require an Avro schema as a standalone file or could be read with Avro readers that require a schema at initialization time).