What do you like best?
A lightweight performance-focused implementation and various features:
− IO optimized - it's a columnar store, no indexing structures to maintain like traditional databases, the indexing is achieved by storing the data sorted on disk, which itself is run transparently as a background process;
− Reduced data storage footprint through advanced encoding schemas (RLE, common-delta, etc.) as well as compression algorithms Ability to operate directly on the encoded data;
− Querying will only read specific columns' data, pushing predicates to the storage layer is very important, analytical queries on row store databases will never be able to match that. Columns with RLE are similar to having an infinite number of partitions and also sub-partitioning levels, in some cases if multiple predicates are used with proper sorts it can be incredibly fast.
ANSI SQL compliant, SQL-92 and most of the SQL-99 standard; easy to extend with user-defined functions written in C++, Java, R and to turn it into a powerful data processing engine that is able to easily parallelize, distribute, and partition datasets for processing (moving processing between Hadoop Pig/Hive and Vertica is very simple).
Developer friendly: from verbose explain plans and query profiles to the ability to track execution engine metrics by query paths/operators (e.g. CPU cycles used, rows processed, bytes sent over network, etc.)
Easy to setup and manage fairly large clusters. In our experience a dba should be able to handle many large clusters.
Very stable, easy to scale, reliable, highly available (most of issues we had were hardware issues; never had down time or lost any data).
Constant addition of features, improvements (e.g. support for large data types, GIS package, flex tables, etc.).
What do you dislike?
Price may be high; small startups trying to keep costs down may choose open source (e.g. HBase, etc.)
There were some stability issues at first when certain errors were bringing down nodes, etc. but have been solved for a while
Supporting large workloads (many concurrent queries) is still not a strength of Vertica.
Loading very large data sets may use some improvements (e.g. in some cases you may have more capacity to parse and segment the data on the client side and stream the data to a specific node thereby directly reducing load and data redistribution between nodes.
Depending on the data model used, in some cases you might have trouble optimizing the queries (large joins with large group-by's on columns across multiple relations);
Recommendations to others considering the product
Dr. Michael Stonebraker was the co-founder and architect (Vertica is based on the C-Store project). If you haven't heard of him it suffices to know that he received the 2015 Turing Award for his contributions to database systems.
You will need to understand its physical layer and how your queries will access and process the data to come up with the right design (Database Designer can be a great help to get you started) and then you will be amazed how fast you can do data filtering, joins and group bys, etc. on billions of records with a handful of nodes in minutes. At the same time if your queries are suffering from bad segmentation, can't do block processing, push predicates to storage layer, etc. then you will not really get anything from what Vertica has to offer.
At the same time, using Vertica as a traditional OLTP database, with many small transactions inserting/deleting/updating data is not going to take you very far so that’s an obvious case where Vertica is not recommended.
With all the NoSQL, NewSQL buzz I’ve seen there is a misconception that SQL is old, RDBMS don't scale, etc. but the reality is many of these NoSQL products are adding more and more SQL-like features to stay competitive so be sure SQL is here to stay.
What business problems are you solving with the product? What benefits have you realized?
Vertica is not the silver bullet but based on my experience in 9/10 cases in which you need an analytical database, Vertica is probably the answer.
Currently we're using Vertica more as a data processing engine in conjunction with a Hadoop cluster as some of the steps are way more efficient than doing them in Hadoop and easier to manage (e.g. iterative processing steps). We also had a pretty good experience using it with Storm and Hadoop.
The main reasons I usually choose Vertica are it's the performance that’s fairly easy to scale and extend.