AWS Data Pipeline makes it very easy to get started and move data between various sources. If you're already using AWS services such as S3 or Redshift, Data Pipeline heavily reduces the lines of code / applications required to move data between AWS data sources.
You can simply run data pipeline on a schedule, specify where you want to move data between, and tell it to run processing in a series of steps if you need to filter your data. It is a good and simple way to do basic data movement.
Pipeline retries, gives you errors, and has a dashboard that shows you all of the jobs that are in queue.
What do you dislike?
Data Pipeline can be a black box at times. Error messages are not good, and it is difficult to understand what exactly failed since it is an amazon service. The scheduler doesn't give timely notifications at times, so it is hard to determine the true state of the data pipeline
Recommendations to others considering the product
If you are a small company and don't want to spend resources building your own pipeline, use AWS. It is out of the box, gets the job done, and you can learn to work around some of its deficiencies. It provides a great way to schedule the movement of data.
What business problems are you solving with the product? What benefits have you realized?
We have various ETL jobs that run on data pipeline. We move data between various sources into a data warehouse for final reporting. Data pipeline is used to move all data into a single data lake, and then run pre-processing steps before loading into a warehouse.
* We monitor all AWS Data Pipeline reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. Validated reviews require the user to submit a screenshot of the product containing their user ID, in order to verify a user is an actual user of the product.