Apache Pig

(15)
3.8 out of 5 stars

Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.

Work for Apache Pig?

Learning about Pig?

We can help you find the solution that fits you best.

Apache Pig Reviews

Write a Review
Filter Reviews
Filter Reviews
  • Ratings
  • Company Size
  • User Role
  • User Industry
Ratings
Company Size
User Role
User Industry
Showing 15 Pig reviews
LinkedIn Connections
Pig review by <span>Gökhan E.</span>
Gökhan E.
Validated Reviewer
Verified Current User
Review Source

"Big data? No problem with the operations!"

What do you like best?

I have used apache pig on my part time job which handles big data and apache pig scripts helped me a lot. I have created custom functions and it makes easier to handle complex and huge tasks and makes easier to maintain after configuration. Also the system optimization of pig script jobs helped me to focuse on semantics and so. The default mode i mean the map reduce mode is very efficient.

What do you dislike?

Sometimes i feel that our data is not that big in order to be handled with pig script. Its documentation makes me sweat and takes a lot of time to get used to.

What business problems are you solving with the product? What benefits have you realized?

I was using it in Ad platform and our servers were getting too much requests and datas. While targeting this data pig scripts helped me a lot.

Sign in to G2 Crowd to see what your connections have to say about Apache Pig
Pig review by User in Higher Education
User in Higher Education
Validated Reviewer
Verified Current User
Review Source

"A very good big data solution for querying"

What do you like best?

1. SQL like syntax.

2. Ease of use.

3. Short learning curve

4. Ease of maintenance

5. Decrease in development time. This is the biggest advantage especially considering vanilla map-reduce jobs' complexity, time-spent and maintenance of the programs.

What do you dislike?

1. Slow for larger queries

2. Errors need to be better

3. Support is less

4. Source and Sink need to be present

5. Especially the errors that Pig produces due to UDFS(Python) are not helpful at all. When something goes wrong, it just gives exec error in udf even if problem is related to syntax or type error, let alone a logical one. This is a big one.

Recommendations to others considering the product

You have UDFs which you want to parallellize and utilize for large amounts of data, then you are in luck. Use Pig as a base pipeline where it does the hard work and you just apply your UDF in the step that you want.

Lazy evaluation: unless you do not produce an output file or does not output any message, it does not get evaluated. This has an advantage in the logical plan, it could optimize the program beginning to end and optimizer could produce an efficient plan to execute.

Enjoys everything that Hadoop offers, parallelization, fault-tolerancy with many relational database features.

If you want to do apply some statistics to your dataset. Functional programming paradigm fits quite naturally to pipeline processes, so I expect it to be quite successful.

What business problems are you solving with the product? What benefits have you realized?

Data Analysis for the raw data we have. Initial data exploration has been useful with pig.

What Big Data Analytics solution do you use?

Thanks for letting us know!
Pig review by User in Marketing and Advertising
User in Marketing and Advertising
Validated Reviewer
Review Source

"Apache Pig"

What do you like best?

1. Ease of use, its performance

2. MapReduce is fully abstracted

3. Ability to chain multiple MR jobs into a single Pig script

4. Allows you quickly to crank through big data to get some analytics done

What do you dislike?

1. Slower in performance compared to Spark

2. Less support e.g String concatenation only allows 2 at a time, cannot sort & filter inside Group BY, etc

3. Cannot read in other forms of input like csv as parquet, what Spark can do

4. Error handling needs to be better. Not easy to debug UDFs

Recommendations to others considering the product

Definitely a good starting point for writing quick big data applications. Anyone who has experience writing queries and basic programming experience in Java, should be able to pick it this language up in short time. Its really useful to learn and makes ad-hoc analytics very convenient.

What business problems are you solving with the product? What benefits have you realized?

Few of our proprietary data pipelines involving batch-processing are written using Pig. Programmers can focus more on writing the core analytics logic rather than getting worried about so many mappers/reducers for each intermediate sub-task.

Pig review by <span>Anson A.</span>
Anson A.
Validated Reviewer
Verified Current User
Review Source

"Apache Pig Review"

What do you like best?

creating udaf's easily.

manageable and easy to write pig languages

can be streamed through python and scripted out vs writing an MR job

What do you dislike?

not as truly scalable as writing MR job.

joins are easy, but not as easy as hive queries

doesn't handle parquet really well

not as fast and flexible as spark

What business problems are you solving with the product? What benefits have you realized?

main process pipeline flows are using pig.

creating multiple UDAF/UDFs as well as other jar libraries that only pig and hive can handle

Pig review by <span>Stirling N.</span>
Stirling N.
Validated Reviewer
Review Source

"Apache Pig - Faster execution"

What do you like best?

Apache Pig is a 1st pass compiler, which is at its best using DAG.

What do you dislike?

If you want to drill down and use complex structures, it is not the best way.

Recommendations to others considering the product

4 great purpose it is the right tool, finding out is, however a trickier business.

What business problems are you solving with the product? What benefits have you realized?

If you do not know the structure in advance, then DAG and declared execution plans may be the best way to find it out - then use SQL once the plan is know.

Pig review by Administrator in Internet
Administrator in Internet
Validated Reviewer
Review Source

"Analyzing large data can be so easy with this tool!"

What do you like best?

Pig is a great high level scripting language for operating with big datasets that work under the Apache's open-source project Hadoop. This software allow you to transforming and optimizes the data operations into MapReduce, something that can be challenging with others platforms.

I recomment this tool to my clients that need to manage a big list of users that will load a considerable amount of data daily. This can help you to clean, search and declares independent execution plans easily.

You can compare this tool with sql programming but the way this tool use UDF help you with ease call the functions directly with Java, Js, Python and of course the big Ruby.

What do you dislike?

At the beginning was a bit difficult to get used to working under his pig latin language, however there is very good documentation online that allow you to manage your process.

Apache Pig it got many competitors so they will need to optimize the system because sometimes the scripts won't get you the ideal results.

Recommendations to others considering the product

Learn Pig Latin and be ready to have an easy day of work

What business problems are you solving with the product? What benefits have you realized?

My clients used normally to big process with data sets that will contain specially json objects they will be available to solve very convoluted data sets.

Kate from G2 Crowd

Learning about Apache Pig?

I can help.
* We monitor all Apache Pig reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. Validated reviews require the user to submit a screenshot of the product containing their user ID, in order to verify a user is an actual user of the product.