Hadoop HDFS

(81)
4.3 out of 5 stars

Hadoop HDFS is a distributed, scalable, and portable filesystem written in Java.

Work for Hadoop HDFS?

Learning about Hadoop HDFS?

We can help you find the solution that fits you best.

Hadoop HDFS Reviews

Write a Review
Filter Reviews
Filter Reviews
  • Ratings
  • Company Size
  • User Role
  • Industry
Ratings
Company Size
User Role
Industry
Showing 85 Hadoop HDFS reviews
LinkedIn Connections
Hadoop HDFS review by Akash S.
Akash S.
Validated Reviewer
Verified Current User
Review Source

"Hadoop Ecosystem- THE BEST FOR BIG DATA ANALYTICS."

What do you like best?

It is a Hadoop Distributed File System which is made totally in Java It provides High Scalability and Redundancy. It Stores Both Structured as well as Unstructured Data and it also provides a really fast Data retrieval time. Hadoop has a large Community besides it and so many problems are solved instantly.

What do you dislike?

It is not a Useful tool for the Beginners who want to make A career in Big Data Analtics. MongoDB is still more easy to use and there are many tutorials also available to learn MongoDB. Sometimes Due to Local Hosting of the Tool it lacks sometimes. The UI is also not up to the Mark.

Recommendations to others considering the product

From startups to enterprises, for the modern and the Best Softwares or Analysis of Data use Hadoop HDFS. It allows you to easily store the Structured as well as Unstructured Data and Will allow the Users to Retrive it Easily for Further uses of the Data. Hadoop has a large amount of Softwares which supports various functionalities in BIg Data Analytics Like FLUME, HIVE, PIG-LATIN, MAHOUT, etc. Machine Learning can also be done on the Big Data in this Hadoop Ecosystem.

What business problems are you solving with the product? What benefits have you realized?

Hadoop HDFS as the Name suggests (Hadoop Distributed File System) allows users to Store any Large amount of Data which then can be easily used for analysis. My Business Analytics are all done through Hadoop which helps the Business to grow.

Sign in to G2 Crowd to see what your connections have to say about Hadoop HDFS
Hadoop HDFS review by Prabhudayal A.
Prabhudayal A.
Validated Reviewer
Verified Current User
Review Source

"HDFS- New era File System"

What do you like best?

1.Storing of file in sequential format, using key value pair.- Stores file as key and content as value and encrypts them.

2.128 mb block size.- Previously it was 64 which was less. But still some people like the old block size.

3.storing multiple copies of data- Store same data in multiple nodes. It helps in case there is a failure of single node. As we all know sole purpose of hdfs is using commodity hardware and make the service available.

4. Horizontal scaling and distributed architecture- This helps the system to grow without worrying about previous data. You can increase the number of nodes in case of a drastic change in the amount of data.

What do you dislike?

1.Map reduce jobs takes much time for smaller amount of data too. As the map and reduce jobs are getting created for any job.- Its recommended to use relational database in case of small amount of data in range of GB

2.The immutable nature- Files in HDFS can not be altered-more specifically it can not be modified. Append is allowed though.

Recommendations to others considering the product

yes, it is recommended for bulk data

What business problems are you solving with the product? What benefits have you realized?

We have developed module to push .eml files to HDFS and store it in sequential file.

The retrieval of data is much faster as compared to traditional file system.

What Big Data Processing and Distribution solution do you use?

Thanks for letting us know!
Hadoop HDFS review by Roshan M.
Roshan M.
Validated Reviewer
Verified Current User
Review Source

"Big Data Storage"

What do you like best?

HDFS is fault tolerant as it makes multiple replica of the data which is stored in it, thus making it more reliable. Also when compared with traditional file systems, it is much robust and efficient in working on bulk amount of data, as it process via map reduce in the back-end. Also one of the major advantages is that HDFS can easily run on commodity hardware, making the initial setup cost very low. The commands used for HDFS to manage the files are almost the same as used in shell, providing an ease for the same

What do you dislike?

One of the major drawbacks of Hadoop file system is that is the amount of data to be dealt with is less, it won't be efficient, as the time for processing small data is equivalent to time taken for bulk data.

By default, the security measures in Hadoop File system are disabled, making it insecure for data storage.

Recommendations to others considering the product

For those who are dealing with bulk amount of data, can surely go for hadoop file system, having offline applications, but those dealing with real time small amount of data, this is not recommended.

What business problems are you solving with the product? What benefits have you realized?

We are moving data from traditional file system to, hadoop storage, as the amount of data is increasing day by day, so we require a platform to manage this bulk data in a better manner, robust and cost efficient manner.

Thus HDFS acts as the base storage for storing historic records

Hadoop HDFS review by Administrator
Administrator
Validated Reviewer
Verified Current User
Review Source

"Brilliant for data mining across large cache rdbms"

What do you like best?

Hadoop can take loads of data from disparate sources quickly and performs well under testing performance conditions with multi-server configurations.

Hadoop is customizable so that nearly and most of our business objectives can be justified with the right combination of data and reports.

Very scalable product for infinite number of rows and large number of parallel processors through dynamic clustering. THe product is also very economical in comparison to SAS.

What do you dislike?

Less organizational support system. Bugs that need help outside help take a long time to get pushed as an update.

Does not come with too much business knowledge hence the container needs a lot of programming to make usable for a specific use case.

Recommendations to others considering the product

use if you have a heavy dataset (most other solutions will not work in this case). it is economical in the medium to long term although the implementation is higher than for most other cloud-based solutions.

What business problems are you solving with the product? What benefits have you realized?

using hdfs for storage and processing.

intermediate data filtering as a middleware

Hadoop HDFS review by swetha m.
swetha m.
Validated Reviewer
Review Source

"Data Lake HDFS"

What do you like best?

1) Distributed File System helps in partitioning huge data into multiple machines which helps in storing peta data,it follows write once and follows WORM-write ones read many times.

2) We have master node to distribute data among nodes and maintain the metadata of file version and path in it which is easier to spot the files.

3) Data Loss- As data is stored in multiple data nodes, there is a replication in case of any failure and very less chance to lose data.

4) Reading, copying, moving files to HDFS using putty commands is easier.

5) Apache ambari provides the user interface for Hadoop eco-systems which helps us to download,copy,rename,move and change permissions to directory and files in HDFS more easier.

6) Use of checksum for data integrity helps to check corruption of data.

What do you dislike?

1) Failure in namenode has no replication which takes lot of time to recover.

2) As Block size has a limit in size,storing small files is not efficient.

3) It doesn't allow multiple users to write to a file.

Recommendations to others considering the product

1) HDFS is a filesystem which has huge memory and can store your files in a distributed manner in multiple network machines.

2) Follows Write ones and read multiple times slogan along with replication of data in data nodes.

3) we have master node to distribute data among nodes and maintain the metadata of file version and path in it which is easier to spot the file

What business problems are you solving with the product? What benefits have you realized?

1) We are able to load peta bytes of metadata to Hbase using map reduce programs by creating H file in HDFS

2) We are able to scheedule our jobs by keeping the relevant files in HDFS by oozie yarn user.

3) We are able to store both content(any flat file) and metadata in the form of H file in HDFS and finally load to Hbase.

4) We are storing logs in HDFS for the date which keeps track of the job

5) We run purging module to delete files from HDFS ones its loaded to HBASE

Hadoop HDFS review by Rupesh A.
Rupesh A.
Validated Reviewer
Review Source

"Hadoop distributed file system review"

What do you like best?

Hadoop distributed file system is a distributed,scalable,fault tolerant and very efficient data storage platform. This is used to store data and can be used to support data processing frameworks like mapreduce and Spark. The best thing about hdfs is that it can be used by multiple things to create a solution. Best thing about hdfs and hadoop framework is that for training purposes we can even create single node cluster in our laptop.

What do you dislike?

There are not much to dislike but speed is reduced when we deal with small files. if there are lot of small files to save then name node will be under pressure for saving the entry of those files. metadata will increase and hence performance will decrease.

Recommendations to others considering the product

HDFS is a must use solution as we have this as a complete storage solution and for now we have not found anything which can replace it as a storage solution. This can be used with spark as well as mapreduce for real time analysis as well as batch processing. we can store data by different compression techniques that is also a very good thing.

What business problems are you solving with the product? What benefits have you realized?

we are using hdfs as storage solution for large data which is got from legacy system, we use it with map reduce and spark framework to do some analysis of data. we use hbase on top of it and Also apache phoenix. it serves as storage solution by our map reduce programs to save intermediate and final outputs.

Hadoop HDFS review by Sushant D.
Sushant D.
Validated Reviewer
Verified Current User
Review Source

"Big Data Storage "

What do you like best?

1.Data accessibility is very fast for huge amount of data.

2.It keeps multiple copies which makes it fault tolerance, during failure scenario.

3. It mainly uses commodity hardware, making it cost effective.

What do you dislike?

The major drawback is for small amount of data, as the processing which traditional system would have take for small data would be less.

Recommendations to others considering the product

I would surely recommend hadoop to people, for dealing with large amount of data.

As it provides an ease, for the business solution, in a affordable price.

What business problems are you solving with the product? What benefits have you realized?

For business aspect, we are moving from old storage system to hadoop storage, as the amount of data is growing day by day, and we need a better and a feasible way to deal with this problem

Hadoop HDFS review by Prashanth P.
Prashanth P.
Validated Reviewer
Review Source

"Big Data Storage"

What do you like best?

HDFS or hadoop distributed file system is the storage component in Hadoop, where all the data resides at the end of the day. This is like a hard disk is to a computer, but actually this is a type of file system which allows user to store the data.

HDFS is very cost efficient. It is also fault tolerant as it makes replica of the data which is stored in it, thus maintaining a backup of all the files in commodity hardware.

What do you dislike?

The major drawback of Hadoop is the lack of security measures taken, for sensitive information. This may not be considered purely for HDFS, but HDFS being a component of Hadoop, falls under this category.

Also it take time in processing small amount of data, thus making it not so robust for less data as compared to bulk data.

Recommendations to others considering the product

For storing huge data, and managing this bulk data in a robust and cost efficient manner, I would surely recommend HDFS, but if the amount of data to be dealt with is less, then Hadoop File system is not recommended, as it consumes some time in processing.

What business problems are you solving with the product? What benefits have you realized?

In our business, we are moving from traditional file system to Hadoop File system, as the amount of data is growing day by day. Thus to handle this situation, we are moving to Bid Data, to manage this bulk data in a robust and efficient manner. Also the cost of installation, is very low as Hadoop works on commodity Hardware, and keeping replica of files, make it fault tolerant.

Hadoop HDFS review by Dhharvi S.
Dhharvi S.
Validated Reviewer
Verified Current User
Review Source

"Storing Bulk Data"

What do you like best?

HDFS is fault tolerance and can handle bulk data easily.

Cost at the end of the day for setting up Hadoop system is less, hence leading in less cost for storing bulk data.

It makes multiple copies of single data, which can be treated as a backup.

What do you dislike?

There are not much things for dislike, but few of them are

1. Security by default is not enabled in Hadoop system causing lac of security constrain.

2.For small data, this is not fit, as it may consume the same amount of time as for bulk data.

Recommendations to others considering the product

Only For storing bulk data, hadoop file system should be recommended.

What business problems are you solving with the product? What benefits have you realized?

Loading all the data from old file system to Hadoop

Hadoop HDFS review by Ritwik K.
Ritwik K.
Validated Reviewer
Review Source

"Caters to all your Data Processing Needs"

What do you like best?

HDFS is inexpensive because of two reasons. Firstly, the filesystem relies on commodity storage disks that are much less expensive than the storage media used for enterprise grade storage. Secondly, the filesystem shares the hardware with the computation framework as well, in this case, MapReduce. Also, HDFS is open source and does not levy licensing fee on the user.

HDFS has been around for more than 7 years and is considered mature technology. There is a large community behind it and a broad range of organizations that are storing petabytes of data on HDFS.

HDFS is optimized for MapReduce workloads. It provides very high performance for sequential reads and writes, which is the typical access pattern in MapReduce jobs.

What do you dislike?

The main drawback of HDFS is that it is not POSIX compliant. This means HDFS is immutable, that is, files cannot be modified.

What business problems are you solving with the product? What benefits have you realized?

We have around 50 GB data getting generated per hour per colo (we operate from 4 colos). Hadoop is udes in InMobi to make sense out of this data.

Hadoop HDFS review by Indrajeet S.
Indrajeet S.
Validated Reviewer
Review Source

"A file System of its Own"

What do you like best?

The Word itself Hadoop Distributed File System, Its a file system of its own , that is it is like FAT, FAT32 NTFS or ext4 to whatever the system that we have seen, It is a file system to store the data

It is better than any other storage system because of the simple fact that it does not rely on any other file system to store the data.

The other things that I like is it can be increased to any extent and is capable of handling the pica bytes of data. It has capability to store and make the data available to any large resource.

And the Ambari UI that is awesome to work on.

What do you dislike?

The only fact that it is built in java so its very complex to start working with this solution, it require huge experience to start working with Hadoop.

Recommendations to others considering the product

Yes, The name itself tells you that you should go with the solution this is the future of data analytics

What business problems are you solving with the product? What benefits have you realized?

We had a huge data base and wanted to store the digital data as well as the data stored in tables and csv.

So as a future proof solution we went with Hadoop.

Hadoop HDFS review by User in Computer Software
User in Computer Software
Validated Reviewer
Verified Current User
Review Source

"The industry standard for open source distributed filesystems"

What do you like best?

HDFS benefits from a vibrant community of passionate open-source software contributors who have made it the filesystem of choice for users trying to get fault-tolerance and performance without vendor lock-in. It also has a number of easy-to-use access points (the HDFS shell, Java API, Thrift, and REST being the most popular), which means you can reach your data through whatever means you'd like.

What do you dislike?

HDFS is not the easiest distributed filesystem to use and a number of design decisions made have led to some believing that it's not as performant as it could be since it tries to be everything for everyone. Look elsewhere if you have a very specific use-case as far as availability is concerned, for example.

What business problems are you solving with the product? What benefits have you realized?

My company integrates a variety of data sources through workflows that include HDFS as both a source and a destination. The vast ecosystem of access points have made it among the smoothest parts of our architecture to incorporate.

Hadoop HDFS review by Jonathan A. A.
Jonathan A. A.
Validated Reviewer
Verified Current User
Review Source

"I found HDFS stored large data sets reliably, and to stream data sets at high bandwidth was awesome"

What do you like best?

Well...HDFS files are write once files. I consider HDFS files as write-once and read-many files. There is no concept of random writes. It is optimized for streaming access of large files. I would typically store files that are in the 100s of MB upwards on HDFS and access them through MapReduce to process them in batch mode.

What do you dislike?

HDFS doesn't do random reads very well. A caveat of HDFS to remember, it is a distributed file system abstracted on top of local file system by hadoop, suitable for storing huge files; however, it does not provide facility of tabular form of storage as such.

Recommendations to others considering the product

MUST UNDERSTAND!

HDFS not a No-SQL. both may serve the purpose of storing huge data in distributed manner.

key diff is..

In HDFS, its easier to store the data and it retrive entire row. i.e no specific key based data access.

HDFS mainly for Write once and Read many.

Updating in existing file in HDFS it means creating new file

Few/most of the leading No-SQL support HDFS

mainly -> HDFS is Distributed File System No-SQL is Data store - alternative of RDMBS

What business problems are you solving with the product? What benefits have you realized?

Trying to understand merging of files without copying them down locally using the built-in hadoop commands. Currently writing a mapreduce tool that uses the IdentityMapper and IdentityReducer to re-partition the files. Maybe I will merge all files into a single file on HDFS, run the job with just 1 reducer. If, on the other hand, I may want to partition the files into more parts, I will possibly run the job with more reducers.

Hadoop HDFS review by Suraj D.
Suraj D.
Validated Reviewer
Review Source

"Its HDFS a file System of its Own. SERIOUSLY"

What do you like best?

Its a file system like NTFS,ext4,FAT32 or any of the file system you may be knowing.

So now we have a filesystem for a data related activities. So it excites me more than what it does actually.

It has scaled the data storage to a level where it has evolved itself a mainstore File System.

What do you dislike?

There is nothing that one can dislike about the product except the fact it is immensely big in size.

It's a huge thing to learn and implement and to be an expert we need to dedicate ourself to this huge thing.

Recommendations to others considering the product

Should definitely implement this solution to become future proof.

What business problems are you solving with the product? What benefits have you realized?

We needed to match with the pace of the world and needed a solution for future.

So that if data scales up the storage should also be able to handle it.

Hadoop HDFS review by kapil c.
kapil c.
Validated Reviewer
Verified Current User
Review Source

"Very rich ecosystem and is there to stay"

What do you like best?

All type of tool on top of HDFS like pig/hive (help to reduce your time in writing MR jobs) ,sqoop (to transfer data from RDBMS <-> hdfs) ,NOSQL db can use as their storage FS(ex. HBASE) and many more are available currently plus many big -big organization(cloudera ,MAPR ,hortonwork to name a few) are actively contribution in hdfs ecosystem .

What do you dislike?

For beginner/First timer is a bit difficult to set up hdfs in cluster mode and currently hdfs use yarn2 as their resource manager which is develop keeping very narrrow thinking .They can enhanced by looking at MESOS. Every stage data (intermediated result ) store in disk , for streaming processing hdfs is the worst choice.

Recommendations to others considering the product

1. for streaming processing ,dont even look at this.

2.setup and maintain is quite difficult

3.ecosystem is great and community is active adding good feature on top of hdfs.

What business problems are you solving with the product? What benefits have you realized?

I have to develop a rich analytical dashboard for our business client .

The benefits:

1. For the batch processing with fault tolerant feature its simply the best.

2. Spring integration with hadoop no problem at-all.

3. we have to use column based NOSQL db HBASE to support as dashboard analytic and HBASE on top hdfs work like charm .

Hadoop HDFS review by Edward S.
Edward S.
Validated Reviewer
Verified Current User
Review Source

"Robust, Easy to Manage, Distributed FileSystem for Hadoop Applications"

What do you like best?

Automatic replication, stable, compatible with/required for the rest of the Hadoop ecosystem. Its pretty easy to manage. Rack Awarenes ensures that losing a single rack doesn't result in the loss of data. Overall, its kind of an awkward thing to write a review about since, it is really an enabling-technology. However, when combined with an analytical tool that can take advantage of HDFS (like Map/Reduce, Hive, Pig, etc), HDFS shows its value. I'm also not aware of any alternatives to using HDFS with these tools.

What do you dislike?

Its purpose-built for Hadoop, and large-scale data processing. That said, it doesn't really work well as a general-purpose filesystem. You can mount it with NFS, but realize that as bad as NFS is, NFS on HDFS is worse. Don't get caught in this trap. For the most part, stick to interacting with it using Hadoop's tools, not generic filesystem tools over NFS.

Recommendations to others considering the product

Its basically a requirement for Hadoop Map/Reduce, Hive, Pig, and many other tools in the Hadoop ecosystem. Don't replace your SAN with HDFS, however, as there are key features which are missing for that purpose.

What business problems are you solving with the product? What benefits have you realized?

Large scale data processing. HDFS enables Hadoop to bring the code to the data as opposed to shipping the data to the code (like a massive DB server that has storage arrays). Redundancy and Scale.

Hadoop HDFS review by James O.
James O.
Validated Reviewer
Verified Current User
Review Source

"6 Years On, Hadoop Is Still the Big Data Platform to Beat"

What do you like best?

After working with Hadoop for 6 years, I like the direction in which it has evolved. It started as a tightly-coupled offering of a distributed data platform (HDFS) and an analytics processing framework (MapReduce); however, it has since expanding its scope tremendously. After the initial success of MapReduce, it has become quite clear that it has many limitations as an algorithm. Other frameworks such as Apache Spark have far surpassed its capabilities. Seeing that trend, the Hadoop team instead chose to focus on Hadoop as a base platform for dozens of data and analytics offerings. This lead to the strengthening of the already robust HDFS and the creation of YARN as an applications framework. This gives the Hadoop platform a lot of widespread appeal and makes it a great basis for any big data processing platform. Many tools exist for quickly standing up entire Hadoop clusters in very little time, and it treats scaling and fault tolerance as primary as first-order priorities.

What do you dislike?

There are two major areas that Hadoop could use improvement that have existed since the beginning and continue to be a problem for the implementation of Hadoop in real world settings. The first is poor documentation on performance and tuning. Hadoop works fairly well out of the box, but once you start to encounter problems there are very scarce resources on trying to troubleshoot those issues. Hadoop has been around long enough as an open source project that common configuration strategies and troubleshooting techniques should be built into the documentation. Secondly, and more importantly for many users, are the lack of security options for Hadoop. There are few if any built-in options, and any plugable solutions are fairly difficult to implement.

Recommendations to others considering the product

Hiring experienced DevOps talent is more essential than analytic developers. The skills for analytic developers can be trained faster than those of skilled DevOps team members.

What business problems are you solving with the product? What benefits have you realized?

I have mostly been focused on large-scale analytic workflows for our clients. We have utilized Hadoop and its ecosystem components like Hive, Pig, Spark, Flume, Oozie, and others to implement scaleable workflows for ETL and analytic applications. Hadoop enables the scale and reliability that is an absolute requirement of our customers. We have also used Hadoop as a basis for streaming analytics using YARN and Spark for real-time data analysis.

Hadoop HDFS review by David G.
David G.
Validated Reviewer
Verified Current User
Review Source

"I'm a BigData architect with 6 years experience in building Hadoop based infrastructure"

What do you like best?

It's currently the best distributed file system for implementing biodata projects. The main reason rely on the fact that HDFS is fully integrated with many parallel computing platforms for doing BigData analysis: Map/Reduce, Spark, Impala, Drill.

It's very mature, stable and really robust, optimised for streaming data to the application layer.

It provides now a full support for security, Posix-like, Posix ACLS and support for directory based encryption.

What do you dislike?

It works very badly with many small files. HDFS is optimised for dealing with a relatively small number of files but very big. This is an annoying limitation that forces many architectural decisions for supporting the data ingestion effectively.

What business problems are you solving with the product? What benefits have you realized?

We solve problem of data science on huge amount of data. HDFS as part of the bigger Hadoop eco-system provides all the pieces for ingesting/transform and analyse vas amount of data using advanced parallel platforms.

Hadoop HDFS review by James C.
James C.
Validated Reviewer
Verified Current User
Review Source

"Hadoop HDFS is mature solution for storing and processing big data if used in right way."

What do you like best?

Hadoop HDFS is proven scalable and stable enough for big data processing. I have over 6 years in product development and operation on HDFS. Storing and processing terabytes scale data with HDFS. Hadoop HDFS handles scale problem well and most of problem can be solved.

What do you dislike?

HDFS is optimized for big file and batch oriented data process. User should really need to pay attention on "Small File Problem", avoiding produce large amount of small files. This will eventually kill HDFS.

What business problems are you solving with the product? What benefits have you realized?

We use HDFS to store web logs for recommendation system, Call Details Record of carrier and device log of Hi-Tech manufacture. The benefits of HDFS is it's scalability and relative lower cost of storing data and be able to leverage MapReduce, Hive, Impala and Spark for further data analysis.

Hadoop HDFS review by User in Internet
User in Internet
Validated Reviewer
Verified Current User
Review Source

"Hadoop Review "

What do you like best?

I like MapReduce code a lot. Mappers and Reducers and how the overall hierarchy goes by. Hadoop is a platform that I have chosen a year ago and still I am in love with it because of it's simplicity for solving complex problems involving very large database. I also did Apache Giraph which goes into graph processing and was a great experience learning a whole new product of Hadoop. I like solving real life challenges using Hadoop like predicting earthquakes so that the results could be less devastating. This is one of my proposed ideas but there are many more like these which I like a lot.

What do you dislike?

I dislike the numerous products that are developing in Hadoop architecture because a developer can never learn all the products based on hadoop. He/She can learn only a few ones which are used in a very extensive way. So why not integrate other less used products in the mostly used products so that it gets the added functionality.

Recommendations to others considering the product

If you consider switching to Hadoop platform, in the learning phase, don't setup hadoop from scratch. Instead concentrate on your learning and download the pre installed hadoop virtualbox image of cloudera or hortonworks or MapR etc. I took me around a month to fully configure single node and multi node because I implemented them from scratch. Had I used the above mentioned virtual images, it would have helped me a lot and save my time as well.

What business problems are you solving with the product? What benefits have you realized?

Actually I was into learning Big Data and Hadoop but soon I fell in love with it. Recently I did a project on it by reviewing the big sales data of NY stores and processing Udacity's DIscussion forums which is a great way to analyze data and give all the statistics.

Hadoop HDFS review by Administrator
Administrator
Validated Reviewer
Verified Current User
Review Source

"An essential primary tool for distributed programming and data management"

What do you like best?

HDFS supports features such as partitioning and replication that are actually mandatory to be present in a distributed environment. Of course, many optimizations should be done over the next years but the main concept will be always the same. Move code into the data and keep your data safe with no risk depending on the failures. What I like best in HDFS is the user interface which is pretty similar to a common local Linux filesystem. Moreover, HDFS is very compatible and that can be integrated with the majority of the frameworks that are used today like Hadoop and Spark. Last but not least, HDFS is open source and a huge community supports it.

What do you dislike?

HDFS has some disadvantages as well. First of all, I would really like it to be more customizable and to provide more features to the user interface. By doing this, users will be free to play and experiment with new ideas which will be integrated with HDFS. I also have observed that someone has to be an expert in order to use it securely in his application and there is no much documentation about how to achieve this specifically in HDFS.

Recommendations to others considering the product

There are a lot of systems that someone can do his job but HDFS would be always the most open one. It is also a very good choice for someone who is completely unexperienced with distributed programming because there is a lot of documentation on the Internet.

What business problems are you solving with the product? What benefits have you realized?

The most common problem that I am trying to solve is the big data management and the application of plenty of algorithms to this kind of data. I am currently also trying to integrate it with a new architecture that I am working on.

Hadoop HDFS review by User in Computer Software
User in Computer Software
Validated Reviewer
Verified Current User
Review Source

"High Fault Tolerant File System"

What do you like best?

HDFS runs usually on top of commodity hardware and failure could be common. I really like fault handling feature of HDFS because it can accommodate failures and still do MAP REDUCE jobs in parallel with lightening speed.

What do you dislike?

Set up of HDFS is extremely painful especially dealing with all permissions and ownerships. In addition to that, there are so many products being developed nowadays and it's extremely hard to keep up with those.

What business problems are you solving with the product? What benefits have you realized?

We have developed an application which shows data map and data flow from source to target. We use HDFS to store enterprise big data and then use proprietary software on top of HDFS and Titan to achieve that.

Hadoop HDFS review by Shulhi S.
Shulhi S.
Validated Reviewer
Review Source

"HDFS for logs storage"

What do you like best?

Distributed file storage made easy with using HDFS. I don't need to know where the files are stored physically in the server because HDFS exposed all the files as if it was a single storage with multiple backup (depending on you replication factor). In term of using HDFS API, it is straight forward to use.

What do you dislike?

Configuration. To get HDFS running might be easy or complicated depending on your experience. We are using Hadoop together with Cloudera, so that was really easy for us to get things started. However, as any other Hadoop components, fine tuning HDFS can be tricky. Debugging HDFS can also be tricky, like suddenly HDFS doesn't allow write due to it was in safe mode. At the point I was using Hadoop, getting Hadoop to work with HA is also challenging, namenode was a single point of failure. HDFS also doesn't work well with lots of small files. For average user, it can be daunting for them to access HDFS (I think HDFS has web app running with limited functionality), for developers it would be no issue.

Recommendations to others considering the product

HDFS is a great tool if you're looking for proven solution for file storage that offers distributed storage and file backups. However, HDFS is just a file system and nothing more than that. I've got clients who think HDFS is like magic, put up files into HDFS and come out analytic. Hadoop is prone to failure, having someone who knows Hadoop in and out is great plus.

What business problems are you solving with the product? What benefits have you realized?

I was building internal tool for managing logs and analyzing logs for business intelligence. We used logs as our source to train machine learning algorithm to detect system failure.

Hadoop HDFS review by Administrator
Administrator
Validated Reviewer
Review Source

"Hadoop HDFS"

What do you like best?

It is a good way to make big storages and for distributed system.

What do you dislike?

I don't know much about Hadoop HDFS.

But If I have to answer the question then,

It is hard to use whenever I wanted to control some files or directories.

Commands are not comfortable for me.

I wanted to use like Linux command.

This is a little bit different with Linux's, I think.

Recommendations to others considering the product

Hadoop HDFS is a good solution for distributed and large scale data when if you had to control big data for text mining or data mining using machine learning like things.

What business problems are you solving with the product? What benefits have you realized?

My company is related with data mining and machine learning using social data and etc.

Some project needed to prediction from that data for making prediction models.

So We decided to make that using Hadoop ecosystem.

Finally, We achieved the project using Hadoop and Map & Reduce function.

and I realized that it could be a pretty good solution.

Hadoop HDFS review by Administrator in Higher Education
Administrator in Higher Education
Validated Reviewer
Verified Current User
Review Source

"Hadoop is a tool with both bitterness and sweet"

What do you like best?

Hadoop is a collection of software that handles distributed file system (HDFS), and distributed processing mechanism on top of it (MapReduce). It is highly scalable and reliable. With Hadoop, users could specify their processing requirements on large datasets without worrying the details of underlying communication and data distribution. Hadoop can scale up easily to adapt to workflow increase. Automatic data replication mechanism in HDFS guarantees its reliability.

What do you dislike?

Hadoop is written in Java and it is not fast. It cannot handle the data processing requests in real-time. Its processing layer, MapReduce, simplifies the processing logic by supporting only a Map and Reduce function, but it also introduces inconvenience to express complicated processing logic.

Hadoop adopts master-slave architecture, but the master is designed in single-node mode: when the master node is down, it is difficult to get recovered. Users have to purchase high-end hardware to prevent master-node failures.

Recommendations to others considering the product

If you have data that are large in size, use Hadoop. The initial setup and trial is simple; and you can figure out easily whether it is a good solution to your data processing requirements. Why not give it a try?

But hadoop is not a solution for all big data problems. It cannot handle interactive, iterative, and real-time processing well.

What business problems are you solving with the product? What benefits have you realized?

By using Hadoop, we can explore much larger datasets and find the hidden essence in them in order to provide better service.

The cost of developing, debugging, and deploying of the tools becomes easier than before, and the scale of processing is expanded significantly.

Hadoop HDFS review by Aleksey I.
Aleksey I.
Validated Reviewer
Verified Current User
Review Source

"Good enough but not state of the art anymore (compare to Spark)"

What do you like best?

Distributed and fault tolerant, transparent. Mimics Unix FS features which are familiar to many users. Good fit for tech savvy users (config files, command line interface). Fast enough.

What do you dislike?

Not as fast as your local FS. More limited in tools and features than your familiar Unix environment. Bad fit for non-tech savvy users (config files, command line interface).

Not entirely POSIX compliant, but gains in performance because of that.

Recommendations to others considering the product

Well supported and understood solution at the moment. Consider other Spark based options if possible.

What business problems are you solving with the product? What benefits have you realized?

Computing large datasets and storing files for input into map/reduce. It simplifies the process quite a bit - no need to worry about replication and fault tolerance.

Hadoop HDFS review by Timothy S.
Timothy S.
Validated Reviewer
Review Source
Business partner of the vendor or vendor's competitor, not included in G2 Crowd scores.

"Hadoop Cluster Usage"

What do you like best?

Hadoop is a no brainer for big data. The main killer feature is HDFS. Having a redundant WORM file system is amazingly useful. There's a reason Google invented in and Yahoo made it open source. 3 copies of your file just works. Cheap commodity servers, but still fast and stable. Never lose data, store everything. Access and use in multiple use cases. So many tools and other projects around Hadoop make it a must have for all enterprises and startups. You add Spark which most distributions include and you can pretty much do everything you need. Ambari and Hue make it easy to setup now.

What do you dislike?

There's a lot of stuff in Hadoop, also there's always 10 ways to do something and hard to know what's the best. Do you do Storm or one of 20 other frameworks. Should I store in Parquest, ORCFile, Avro or CSV or something else? Do you compress with SNAPPY or nothing. What level of encryption? Is Kerberos good enough for my security. Security is a bit lax and there's definitely a lot of things to configure.

Recommendations to others considering the product

Try it out in one of the sandboxes. It's very easy to install with Ambari. The sandboxes are all setup and running with all the basic tools. Try the HDFS CLI and copy a few files into HDFS. Then try to access them through the CLI and through some basic HiveQL. It's easy to load, transform and query your data. Easy to pull it out of SQL and drop it in HDFS. The only hard part is to figure out what tools to use for BI and for imports.

What business problems are you solving with the product? What benefits have you realized?

Storing everything, accessing everything, not losing data and rapid access to big and fast data. It's great for BI and for applications.

Hadoop HDFS review by 宗 .
宗 .
Validated Reviewer
Review Source

"I am using hadoop-hdfs in bioinformatics for human genome data"

What do you like best?

hdfs is high avalible and scalable, I can expand the storage only add several datanodes. And with hdfs genome data can be easily analyzed by mapreduce.

What do you dislike?

hdfs is not so good for small files, and the nfs-gate-way is also not very well.

Recommendations to others considering the product

I think hadoop has a very good community, although hdfs still has some bugs(I think hadoop-yarn make have more bugs, espically on dokcer-container-executor), I think it will be better.

What business problems are you solving with the product? What benefits have you realized?

When using java, It is not so easy to manipulate files in hdfs by hadoop api. I find an open-source project jsr203-hadoop(https://github.com/damiencarol/jsr203-hadoop) can make things simple. One can read and write hdfs files via NIO api in jdk1.7. But at that time I found a small bug in the project when I tring to move a file. I fixed the bug and the auther (damiencarol) kindly merged my code.

Hadoop HDFS review by User in Computer Software
User in Computer Software
Validated Reviewer
Review Source

"Hadoop HDFS review"

What do you like best?

HDFS is Hadoop distributed File system. The best thing I like about HDFS is reliability I get with Hadoop, its file replication is great and there are very less chance of your data being lost. To get the best benefits out of hadoop keep the file size big. At least 100MB each file. Then you will realize the power of Hadoop. Fault tolerant file system etc.

What do you dislike?

Its a little bit slower, but then what is not slow when you come to big file systems which work with Tera bytes of data. Even Amazon S3 is extremely slow when reading the data from it. Other than that it might be little tricky to find documentation for new users / features.

Recommendations to others considering the product

Get it from some free apache distributor. Don't try to get it from Apache directly as you might face some trivial issues

What business problems are you solving with the product? What benefits have you realized?

Mainly focussing on large scale data storage with data duplication, file redundancy, scalability etc. Used in conjunction with other big data components like Hive, Pig etc. For ETL and analytics applications.

Hadoop HDFS review by Anshorimuslim S.
Anshorimuslim S.
Validated Reviewer
Verified Current User
Review Source

"Hadoop Easy Distributed"

What do you like best?

Well I am using Hadoop HDFS for HBase filesystems. I found it's really easy to deploy. I use Cloudera Manager as hadoop package, it could be more easy. If you have a lot of nodes, then truly you will have power from Hadoop HDFS

What do you dislike?

It's quite troblesome for tuning HBase and HDFS. At first when we have fe w nodes it doesnt looks better, but when we hit more nodes, performance gained. But still, lot of tinkering to do.

Recommendations to others considering the product

Use a good package, don't use bare install

What business problems are you solving with the product? What benefits have you realized?

Social Media monitoring and analytics

Hadoop HDFS review by Consultant in Information Technology and Services
Consultant in Information Technology and Services
Validated Reviewer
Verified Current User
Review Source

"The Hadoop DB next level of Storage"

What do you like best?

The Scalability of the software.

It can scale in both horizontally as well as vertically.

the data is always available.

And the ambari UI, for the UI folks who want the changes to be seen.

What do you dislike?

The HIVE I am unable to understand the implementation.

the community support is nice but as a beginner need to spend a lot of time studying the concept

Recommendations to others considering the product

Go for it, It has everything required to save all the data

What business problems are you solving with the product? What benefits have you realized?

I am trying to create a offload repository to put thousands of records

Hadoop HDFS review by Viresh H.
Viresh H.
Validated Reviewer
Review Source

"Hadoop - Vulnerable By Nature"

What do you like best?

Distribute data and computation.The computation local to data prevents the network overload.

We can easy to handle partial failure. Here the entire nodes can fail and restart. it avoids crawling horrors of failure and tolerant synchronous distributed systems. Speculative execution to work around stragglers.

What do you dislike?

1 ) Rough manner:- Hadoop Map-reduce and HDFS are rough in manner. Because the software under active development.

2) Programming model is very restrictive:- Lack of central data can be preventive.

3) Joins of multiple datasets are tricky and slow:- No indices! Often entire dataset gets copied in the process.

What business problems are you solving with the product? What benefits have you realized?

This is the one advantages of using Hadoop in contrast to other distributed systems is its flat scalability curve. Executing Hadoop on a limited amount of data on a small number of nodes may not demonstrate particularly stellar performance as the overhead involved in starting Hadoop programs is relatively high.

Hadoop HDFS review by Udita P.
Udita P.
Validated Reviewer
Verified Current User
Review Source

"Good "

What do you like best?

It is quick and perform well with small clusters too. I used it to implement my thesis. Could be a little challenging if u not aware with linux

What do you dislike?

The debugging and logging is not very user friendly. Although it has a decent interface for job tracking.

What business problems are you solving with the product? What benefits have you realized?

I used it for my thesis. Scaling is easy. It can be used for lot of big data processing requirements

Hadoop HDFS review by Anil G.
Anil G.
Validated Reviewer
Review Source

"Cost Effective and Reliable Data Platform"

What do you like best?

Open Source

Cost Effectiveness

Highly Scalable

Fault Tolerant

Highly Available

Active & Huge Community Support

Most mature and widely used Distributed Platform.

What do you dislike?

Running HDFS needs a lot of daemons(at least 3 Zookeeper, 3 Journal Node, 2NN).

MapReduce programming is not very easy to learn.

Too many new Projects in Apache Hadoop are breaking the community focus from making limited ROBUST products.(there are more than 30 projects in Hadoop and its hard to keep track now)

Recommendations to others considering the product

I dont recommend using Hadoop if you have very small dataset(less than 1 TB).

Adopting Hadoop/MapReduce has learning curve. So, i recommend doing POC's before finalizing on a solution.

Have at least 10 machines in your production to take advantage of Distrbuted Systems.

What business problems are you solving with the product? What benefits have you realized?

We use Hadoop to store and process data. Its cost effectiveness and speed has led to many optimizations in our data processing. Hadoop is a De Facto Standard to process and store data.

Hadoop HDFS review by Bharadwaj (Brad) C.
Bharadwaj (Brad) C.
Validated Reviewer
Review Source
Business partner of the vendor or vendor's competitor, not included in G2 Crowd scores.

"Great Option for Unstructured Data"

What do you like best?

Hadoop is a very popular big data framework.Hadoop is based on MapReduce, which makes it useful for big datasets. Hadoop can be used for almost any requirement involving huge data and also when data is unstructured. The open source community has built tons of tools around it and evolved it into an ecosystem.

What do you dislike?

I don't see any dislikes on this; the only thing people get confused is, its the right thing for solving every problem. Well, its not.

Recommendations to others considering the product

If you say yes for most of the questions then Hadoop is recommended

1. Data size - does it have TBs-PetaBytes of data

2. How much time you can wait - Hadoop is not instant querying tool

3. What is the data growth expected

4. Can I manage with out any real time operations

5. How much percentage of your data is structured - the low the better

What business problems are you solving with the product? What benefits have you realized?

Massive data collection, storage and analytics. It is extremely cheap to get this up and running. It does not need fancy hardware and its open source. If you are thinking this is open source and looking for support there are enterprise hadoop flavors from Cloudera, Hortonworks, MapR.

Hadoop HDFS review by User in Computer Games
User in Computer Games
Validated Reviewer
Review Source

"Solid, scalable solution"

What do you like best?

HDFS is reliable and solid, and in my experience with it there are very few problems using it. If you have your own data centre and you use Hadoop, it's the obvious choice for reliably storing your data.

What do you dislike?

If your NameNodes all go down, then HDFS is pretty much useless as you won't know which file blocks are where and which files they belong to -- and I've read it's difficult to recover (or impossible) if you completely lose your NameNode file mappings. Fortunately I've never personally seen this occur.

Recommendations to others considering the product

Again, you get it for free if you have your own Hadoop installation and run your own datacentre, so you might as well use it for archiving/storage/input to various ETL. Even if you're "in the Cloud" you usually have access to HDFS, even ephemerally, and it's quicker to do work on it directly than some systems such as Amazon S3 (of course you still need to persist your data back off of HDFS when you're done in such a situation).

What business problems are you solving with the product? What benefits have you realized?

HDFS is used for MapReduce processes, Hive tables, Spark job input, for backing up data... The list goes on. You get replication for free, which is also very useful.

Hadoop HDFS review by User in Education Management
User in Education Management
Validated Reviewer
Review Source

"Used HDFS to store crawled polish internet"

What do you like best?

Well, I like the basic idea - it is distributed filesystem used to store and transform large datasets. Science, we faced the problem of storing and processing multi-terabyte datasets it is only natural to use HDFS

What do you dislike?

Well, it is trivial to fool hdfs security and it was completely ineffective. You have to relly on additional tools, such as Kerberos and it adds complexity to your company.

Recommendations to others considering the product

Hire a preson whose only responsibility is to manage this zoo. You will never use HDFS in isolation (look security concerns) and at some point the cost of managing all Apache big data projects will be a significant burden on your talent pool. It is also wise to take advantage of existing bundles (Cloudera, Hortonworks).

Otherwise the risk of not going beyond experimentation phase is quite significant.

What business problems are you solving with the product? What benefits have you realized?

The main business problem was to crawl and store polish websites. We used CommonCrawl dataset as a source, and Akka framework for highly paralell data processing (it was a "trivially parallelizable" problem") and hdfs with Cassandra and Apache Spark for storage and processing.

Hadoop HDFS review by Internal Consultant in Information Technology and Services
Internal Consultant in Information Technology and Services
Validated Reviewer
Review Source

"Best distributed file system for large datasets"

What do you like best?

What makes a successful platform is scalability and reliability . These are two traits Hadoop does perfectly . When you work with large datasets Hadoop helps you process without giving you worry .

HDFS is the best distributed file system for large datasets . It is complete with integrations with many parallel computing platforms . It is tool you need to scale with huge datasets and it doing magic for use .

What do you dislike?

As with any tool , Hadoop is not a silver bullet for all data related tasks . Cases where the dataset is small or dataset involves transaction , one call feel Hadoop not upto the task .

But no complains here , as we have to realise that Hadoop is not built for small dataset or transactional data.

What business problems are you solving with the product? What benefits have you realized?

1. We have a huge data pipeline .

2. It helps to bring up business reports .

3. We use for the data science analysis . To improve our model.

Hadoop HDFS review by Consultant in Information Technology and Services
Consultant in Information Technology and Services
Validated Reviewer
Verified Current User
Review Source

"Serves an important purpose, though it's often misused or misunderstood"

What do you like best?

It is extremely flexible and able to handle the largest data sets while the Map/Reduce patterns makes it easy to reason about program behavior.

What do you dislike?

Although it's getting better, it would be nice to improve the Streaming API.

Recommendations to others considering the product

Others have mentioned a lower limit of 1 TB for data that I generally agree with, although I might say that you should try to stay within convention systems up to 3-5 TB if possible. Smart indexing and sharding can even take you past that.

What business problems are you solving with the product? What benefits have you realized?

The business problem was extracting and refining data from a large unstructured corpus. Hadoop allowed us to scale this process and be able to iterate using the entire dataset.

Hadoop HDFS review by User in Computer Software
User in Computer Software
Validated Reviewer
Review Source

"Relying on Hadoop since 2008"

What do you like best?

HDFS - solves a big problem and does it well.

It's likely one of the most scalable distributed file systems and most reliable one.

Hadoop, in general has a large expert community around it and it's well maintained and supported.

It's composable which allows other systems (e.g. HBase, Spark, etc.) to layer on top nicely without compromising too much.

HDFS is performant in that it's transparent with regards to data physical location, allowing efficient data manipulation.

What do you dislike?

It's a bit convoluted and not as easy as it should be.

No good separation of client / server configurations.

HTTP APIs are clumsy.

Recommendations to others considering the product

HDFS is great.

MapReduce is also part of Hadoop, but practically obsolete.

What business problems are you solving with the product? What benefits have you realized?

HDFS solves a huge business problem - that of cheap, but reliable big data storage.

It allows us to both store the data and have it available for processing in a scalable manner.

Hadoop HDFS review by Pradeepkumar K.
Pradeepkumar K.
Validated Reviewer
Review Source

"The big data framework"

What do you like best?

Hadoop is the most common term used in big data world. For whatever you do on big data Hadoop will be an underlying element, so Hadoop plays an important role in all data related activities.

First of all Hadoop does not replace a regular RDBMS. It is best suitable for batch operations than real time.

There are recently kafka, spark streaming which can be fit into hadoop stack and make real time analytics possible.

Hadoop is best to process very huge datasets. Its cheaper!

Once the cluster is setup adding or deleting nodes is simple.

There are very good user & developers communities working on Hadoop.

With the recent releases of Hadoop with Yarn, Highly available, Federation the product is getting better and better.

The high benchmarks for Hadoop & Map Reduce operations are very promising to use it.

All reporting and data ware house operations can be shifted to HDFS, Hive, Hadoop stack.

Hadoop can do many things file wise - it splits, merges, archives, unarchives and what not. There are almost all kinds of file operations are compatible with Hadoop and HDFS

There are several use cases like recommendation systems, social network analysis(graph data), sentiment analysis etc implemented successfully on Hadoop.

With the Hive things are much simpler. Just point tables to data(delimited) on HDFS and use it like a database or DWH.

Tools like Sqoop, Flume make easy conversion of streaming data and legacy RDBMS data onto HDFS

What do you dislike?

Frankly I do not see any disadvantages or drawbacks using Hadoop! Its simply great.

But there are some use cases which are not suitable for Hadoop :)

1. For all transaction purposes

2. For small data or structured data

Recommendations to others considering the product

Check yourself on

1. Data size - does it have TBs-PetaBytes of data

2. How much time you can wait - Hadoop is not instant querying tool

3. What is the data growth expected

4. Can I manage with out any real time operations

5. How much percentage of your data is structured - the low the better

So if you say yes for most of the questions then Hadoop is recommended

What business problems are you solving with the product? What benefits have you realized?

1. Building data pipelines.

2. Generating reports on very huge datasets.

3. Performing operations and extracting results from huge datasets.

4. Data analytics.

Hadoop HDFS review by User in Computer Software
User in Computer Software
Validated Reviewer
Review Source

"I use hadoop for scaling machine learning solutions to fault detection problems. "

What do you like best?

The hadoop platform is essentially an open industry standard in cloud computing, several essential tools for modern production quality machine learning applications support scaling via hadoop / spark.

What do you dislike?

The setup and configuration of hadoop and spark is it's greatest weakness. It often takes a non trivial amount of engineering time to setup and tune. Fortunately services such as AWS allow you to hit the ground running without as much setup.

Recommendations to others considering the product

Use AWS if possible, but setting up your own cluster isn't as scary as it appears.

What business problems are you solving with the product? What benefits have you realized?

We are producing fault detection and prognostics for industrial machines and vehicles. We use several non parametric statistics and a good deal of machine learning to get the job done. Hadoop has drastically lowered turn around time on results and even in development (after the initial setup and growing pains subsided).

Hadoop HDFS review by User in Internet
User in Internet
Validated Reviewer
Verified Current User
Review Source

"hdfs, a mature and steady distributed storage engine "

What do you like best?

With HDFS, we can manage our data storage distributed over a cluster of ordinary machines.

You data is reliabe and never loss unless you remove it.

What do you dislike?

HDFS is cannot be used as local data.

If you want to process the data, you can only download it, or run them on MapReduce.

Recommendations to others considering the product

HDFS is reliable mature distributed storage, it the best choice.

What business problems are you solving with the product? What benefits have you realized?

Distributed storage.

Hadoop HDFS review by Xiufeng L.
Xiufeng L.
Validated Reviewer
Verified Current User
Review Source

"My Experience of using HDFS"

What do you like best?

HDFS provides high scalability to manage large-scale data sets in a cluster.

What do you dislike?

It does not provide the user-friendly interface to interact HDFS, e.g. like the GUI to for a DBMS

What business problems are you solving with the product? What benefits have you realized?

I use HDFS to store research data at my project, a smart city project.

The benefit is still the scalability, and the ability of managing large data sets.

Hadoop HDFS review by User
User
Validated Reviewer
Review Source

"Hadoop HDFS Review"

What do you like best?

Ease of accessibility via terminal using standard bash commands.

What do you dislike?

There are a few UIs to access it without terminal, but we could use one which is bug free and has all features.

Recommendations to others considering the product

Definitely should try this out and think about switching from current file management system to this one. While it may take some time to get used to it, once done it will be super easy to use and improve current implementations.

What business problems are you solving with the product? What benefits have you realized?

Data storage and code sharing.

Hadoop HDFS review by Chirag M.
Chirag M.
Validated Reviewer
Verified Current User
Review Source

"Hadoop and it's future"

What do you like best?

Database in the form of large datasets is the need for today.hadoop does that with integrity , robustness, and it's plugin development helps a lot as well

What do you dislike?

Nothing as such but it should provide support for APIs for most popular languages

What business problems are you solving with the product? What benefits have you realized?

I work in banking and storage domain so database need is must

Hadoop HDFS review by <span ue="safe-name" data-safe-name-id="1e5bddaf-7cda-4aed-8fd9-9a0a7c8e2af3">Tuan T.</span>
Tuan T.
Validated Reviewer
Review Source
" itemprop="name" />

"Tuan's experience on Hadoop HDFS"

What do you like best?

HDFS is the most native file system in Hadoop, supported in all Hadoop-based framework and programming APIs. It supports replications and formatting checking nicely

What do you dislike?

just like local file systems, HDFS requires you to write many verbose code to handle the opening / closure, the readers

Recommendations to others considering the product

HDFS is the first native IO protocol you must master, before thinking of other more advanced data management stacks

What business problems are you solving with the product? What benefits have you realized?

We use HDFS to handle our data in all European projects that I'm involved: Sending and receiving financial data, web archives, social media stream crawls, etc.

Hadoop HDFS review by Executive Sponsor in Computer Software
Executive Sponsor in Computer Software
Validated Reviewer
Review Source

"Extensive experience with HDFS in various environments, on premise, cloud, AWS etc"

What do you like best?

Is this a review of the HDFS file system? If so, the performance is great compared to say S3 or other ways to access file systems on Hadoop. Its also tried and tested. However, for the survey to make more sense, I will answer on Hadoop in general too.

What do you dislike?

Inflexible, data needs to be copied to HDFS from other places, one cannot do real-time access from HDFS.

This survey is not well-written if its primarily HDFS that you need feedback on.

Recommendations to others considering the product

Need a better filesystem. We should be able to import data from other sources faster. There should also be real time (in memory) capabilities built-around HDFS.

What business problems are you solving with the product? What benefits have you realized?

Analytics, business intelligence. Typically, I want fast results for jobs and ended up using Impala on top of HDFS.

Hadoop HDFS review by User in Higher Education
User in Higher Education
Validated Reviewer
Review Source

"Highly Scalable Distributed Data Infrastructure"

What do you like best?

It is resilient and rack aware. Choose HDFS for >10TB data infrastructure where data lives in forms such as Thrift, Protobuf, JSON, etc. (for diverse datasets).

It works very well with technology like Mesos and Aurora.

Works with many solutions (Spark, HBase, Hadoop, Scalding, Cascading, Storm, etc.).

What do you dislike?

It's hard to set up. Turnkey solutions for this make it easier, as does contracting out setup. Master election does require tuning and domain knowledge so prefer off the shelf solutions over trying to roll your own deployment.

Recommendations to others considering the product

Stress test your configuration and test edge cases. Losing data is a tough process.

What business problems are you solving with the product? What benefits have you realized?

HDFS solved the business problems of highly scalable data storage in a highly available and scalable environments. Dump data into HDFS without worrying about ETL.

Hadoop HDFS review by Giridhur S.
Giridhur S.
Validated Reviewer
Review Source

"Very friendly solution for those using distributed databases "

What do you like best?

Very easy to set up. Lots of example code and tutorials available online to use.

Java is a well known language so, very accessible for beginners

What do you dislike?

Rewriting functions into MapReduce form is not always easy. Needs some practice. Also interfacing with other languages is slightly difficult.

Recommendations to others considering the product

Umm, i really cant think of a better alternative to this.

What business problems are you solving with the product? What benefits have you realized?

I used Hadoop to implement a similarity search algorithm across photos and videos, to handle the large amount of data and make it scalable, Hadoop was my first choise

Hadoop HDFS review by Ben C.
Ben C.
Validated Reviewer
Review Source

"A 2 Node Hadoop Cluster"

What do you like best?

Setting up a 2 node Hadoop cluster was easy. We just used it as a distributed storage medium.

What do you dislike?

What is the hype behind this? One could easily set up a 2 node HDFS and say a product was conformant.

What business problems are you solving with the product? What benefits have you realized?

Business problems are showing compatibility with HDFS, to show that data can be stored in HDFS, in Oracle, in mysql.

Hadoop HDFS review by Kainat R.
Kainat R.
Validated Reviewer
Review Source

"First priority for Big Data"

What do you like best?

The increased block size gives hadoop advantage over others.

What do you dislike?

Since the block size is large, it does not perform well with data sets lesser than the block size.

What business problems are you solving with the product? What benefits have you realized?

Storing data sequentially and fetching large data sets.

Hadoop HDFS review by ruichang z.
ruichang z.
Validated Reviewer
Review Source

"use hdfs in frequent work"

What do you like best?

hdfs is a distributed file system, I can use it friendly, as its command is similar to linux file command.

What do you dislike?

I think reponse time is large, and also I find maybe a bug, that is if I want to move to middle of a last typed hdfs command, and modify some, it would erase some behind chars.

Recommendations to others considering the product

is open source and command used, and good software for big data coders

What business problems are you solving with the product? What benefits have you realized?

recommendation algorithm for iqiyi

Hadoop HDFS review by Ankur S.
Ankur S.
Validated Reviewer
Review Source

"A Hadoop certified developer, used Hadoop file system as well"

What do you like best?

Scalable for super large datasets. But you really need to have huge data to need it. Most datasets do not really need hadoop.

What do you dislike?

It is slow. Even smaller datasets get spread out and need to be loaded by individual machines.

Recommendations to others considering the product

Ensure you really have huge data requirements, otherwise there are other in memory solutions.

What business problems are you solving with the product? What benefits have you realized?

The analysis of large datasets, but you cannot really do realtime analytics. It is all about the batch processing.

Hadoop HDFS review by tousif k.
tousif k.
Validated Reviewer
Verified Current User
Review Source
Business partner of the vendor or vendor's competitor, not included in G2 Crowd scores.

"bigdata processing for batch data"

What do you like best?

distributed map reduce jobs to process batch data.

What do you dislike?

requires admin to monitor and keep the cluster up and running

Recommendations to others considering the product

if you have huge data processing and in batch mode than go for hadoop map reduce

What business problems are you solving with the product? What benefits have you realized?

batch data processing and reporting

Hadoop HDFS review by User in Higher Education
User in Higher Education
Validated Reviewer
Review Source

"Awesome for large and unstructured data"

What do you like best?

Hadoop is a very popular big data framework.

Hadoop is based on MapReduce, which makes it useful for big datasets.

Hadoop can be used for almost any requirement involving huge data and also when data is unstructured.

To use hadoop one needs to know how map reduce works and distributive computing features, preferably in java

What do you dislike?

Hadoop is not yet as good as legacy RDBMS products in terms of security.

All algorithms cannot be implemented with MapReduce.

Previously there was single point of failure issue but with Hadoop new releases federation and

high availability that was no more an issue

Recommendations to others considering the product

Hadoop is well recommended for all ETL, Reporting usecases.

What business problems are you solving with the product? What benefits have you realized?

It does not need fancy hardware.

Its open source - NO COST

If you are thinking this is open source and looking for support there are enterprise hadoop flavors from Cloudera, Hortonworks, MapR.

Hadoop HDFS review by Administrator in Information Technology and Services
Administrator in Information Technology and Services
Validated Reviewer
Verified Current User
Review Source

"BigB of BigData"

What do you like best?

Working on PetaBytes of Data, it just works !!! No need of having powerful machines works on commodity hardware.

What do you dislike?

Not something that I dislike , but a limitation that nits not meant for small files.

Recommendations to others considering the product

I don't see any alternatives for it. I would recommend to go with Cloudera Manager rather plain vanilla install because installation and management would be easy.

What business problems are you solving with the product? What benefits have you realized?

Apache Log Analyzing , Analytics ,

Hadoop HDFS review by Jianfeng Z.
Jianfeng Z.
Validated Reviewer
Review Source

"Use hdfs for store data warehouse data"

What do you like best?

Good scalability and fault tolerance. Easy access method: shell and java api

What do you dislike?

Sometimes, weird issue would happen. Most of time it can be resolved from the logs, but sometimes have to restar the cluster

What business problems are you solving with the product? What benefits have you realized?

use hdfs to host the data warehouse data originally in MPP. Very cheap compared to MPP and with fair performance.

Hadoop HDFS review by User
User
Validated Reviewer
Review Source

"Industry-grade scalability but a developer experience with room for improvement"

What do you like best?

Technology is free and open source. HDFS is an industry standard for big data processing. Hadoop is Java-based, making onboarding for a majority of developers easy.

What do you dislike?

The Hadoop application programming interface is terse; simple workflows sometimes need to be laboriously converted into its map-reduce paradigm. Other interfaces (e.g. Spark) mitigate this problem.

Recommendations to others considering the product

Be aware that Hadoop YARN is aimed at becoming the next-generation of MapReduce.

What business problems are you solving with the product? What benefits have you realized?

I am a software engineer at a leading research institution. We are using Hadoop to train predictive models on large distributed data sets.

Hadoop HDFS review by Administrator in Internet
Administrator in Internet
Validated Reviewer
Review Source

"Hadoop Hdfs Review"

What do you like best?

Fairly intimidating at first but once you get a grip, it is easy to use. It can get complicated with all the additional functions/platforms that you can use on top of it. But I would certainly recommend the product.

What do you dislike?

It can get a bit intimidating and complicated with all the platforms on top of it.

What business problems are you solving with the product? What benefits have you realized?

Distributed computing and analytics.

Hadoop HDFS review by User in International Affairs
User in International Affairs
Validated Reviewer
Review Source

"I only use Hadoop for one off analytic projects which I can not perform on my laptop"

What do you like best?

I like it because it eliminates the bottleneck of insufficient computing power either on my laptop or company server.

What do you dislike?

Frankly, I haven't had any problems with Hadoop nor dislike any part of it. That being said, I am not a heavy user.

What business problems are you solving with the product? What benefits have you realized?

I had a project which requires me to process NASA satellite images which were massive (1TB per file) and was not possible to process on my laptop, especially when the analysis is conducted with R. This is where Hadoop came in and helped out.

Hadoop HDFS review by Niko G.
Niko G.
Validated Reviewer
Review Source

"Hadoop review"

What do you like best?

User doesn't have to think about the low-level functionalities.

What do you dislike?

Setup can represent an obstacle to a developer without system/OS knowledge.

What business problems are you solving with the product? What benefits have you realized?

Processed CDR (phone call records) dataset. Due to large amounts of data the problem couldn't be solved using traditional database or in-memory processing without HDFS.

Hadoop HDFS review by User in Internet
User in Internet
Validated Reviewer
Review Source

"Software Engineer"

What do you like best?

Hadoop makes it easy to scale to handle large dataset.

The API it provides is simple and intuitive to use.

What do you dislike?

API for other languages, e.g. C++, is not as complete as that provided to Java.

What business problems are you solving with the product? What benefits have you realized?

Data cleansing, filtering, transformation and aggregation. Hadoop makes it very easy to scale to large data set. This is especially important in the field of computational advertising as we need to handle terabytes of raw data, and need to get the result within reasonable amount of time.

Hadoop HDFS review by Sunil S.
Sunil S.
Validated Reviewer
Review Source

"Best scalable distributed system"

What do you like best?

The biggest challenge of a high available and distributed file system is eliminated with HDFS.

What do you dislike?

It is slow but very effective and if you know what right file formats to use, the performance can be blazing fast

What business problems are you solving with the product? What benefits have you realized?

high available data store, distributed, data lake

Hadoop HDFS review by Roberto O.
Roberto O.
Validated Reviewer
Review Source

"Use in personal projectes"

What do you like best?

We can use it to perform several tasks from simple analysis to more advanced machine learning methods. I adapt some ML algorithms to take advantage from HDFS.

What do you dislike?

The learning curve and integration with other solutions.

What business problems are you solving with the product? What benefits have you realized?

Classification problems in massive log datasets

Hadoop HDFS review by Cataldo M.
Cataldo M.
Validated Reviewer
Review Source

"Very Nice Technology"

What do you like best?

It guarantees great scalability and good performance. Take care of using it in the right way. However, the last version has improved several features.

What do you dislike?

Not all the tasks can be exploited in a fruitful way.

What business problems are you solving with the product? What benefits have you realized?

Distributed Storage of Files. Scalability issues.

Hadoop HDFS review by Industry Analyst / Tech Writer
Industry Analyst / Tech Writer
Validated Reviewer
Review Source

"I face with bugs in fedora but not in ubuntu"

What do you like best?

hadoop is one of the best softwares especially for cloud computing engineers.

What do you dislike?

i face with so many problems when I want change my java version. seems silly but every time i wanna go to a higher version this happens to me.

What business problems are you solving with the product? What benefits have you realized?

I am working on data analytics

Hadoop HDFS review by User
User
Validated Reviewer
Review Source

"Store anything and everything"

What do you like best?

Storage capability, reliability, robustness, scalability. Can access through different tools.

What do you dislike?

Data stored in key and value format.sometimes the query gets stuck.

What business problems are you solving with the product? What benefits have you realized?

The amount of data that organisation have generated ,conventional database scalability was a challenge with added cost. Hadoop hdfs resolves the problem.

Hadoop HDFS review by User in Defense & Space
User in Defense & Space
Validated Reviewer
Review Source

"Big data processing eith HDFS"

What do you like best?

HDFS is effective for working with large files. The command-line functionality of HDFS is straightforward and options like put, get, copy are easy to use.

What do you dislike?

HDFS does not support some options that other file systems support.

What business problems are you solving with the product? What benefits have you realized?

HDFS is used with Hadoop for big data processing involved with online advertising.

Hadoop HDFS review by Consultant in Information Technology and Services
Consultant in Information Technology and Services
Validated Reviewer
Review Source

"The Hadoop ecosystem is currently the best choice for many big data projects."

What do you like best?

Enterprise support from different vendors makes it easily to 'sell' inside an enterprise,

a large ecosystem with tons of options.

What do you dislike?

Good big data engineers/data scients are hard to find. There's a lot of misunderstanding in management levels about what the technology can and can not deliver.

What business problems are you solving with the product? What benefits have you realized?

Worked on software to create 360 degree customer views and insights for industries with large customer bases (financial, retail, telecom, media).

Hadoop HDFS review by Administrator in Aviation & Aerospace
Administrator in Aviation & Aerospace
Validated Reviewer
Review Source

"ICT - CAE specialist"

What do you like best?

Possibility to manage an huge amount of data from different data sources.

The integration with languages like r or python for mathematical analyses.

What do you dislike?

It's not easy understand the best approach in order to solve your problem and the proper tool you need to use.

What business problems are you solving with the product? What benefits have you realized?

We need to manage an huge amount of sensors time histories and permit users to search and correlate them with other data sources

Hadoop HDFS review by User in Computer Networking
User in Computer Networking
Validated Reviewer
Review Source

"If you deal with lot of data set Hadoop is the way to go"

What do you like best?

Easy learning curve and a huge community backing.

What do you dislike?

Its helpful only if the data set that we're dealing with is huge. There are other competitors in the market which can easily be adaptable and even Paas solutions that case be used turn key.

What business problems are you solving with the product? What benefits have you realized?

Writing a business process that deals with collection of user data on the network and create a report. It's extremely useful and scalable.

Hadoop HDFS review by Administrator in Information Technology and Services
Administrator in Information Technology and Services
Validated Reviewer
Verified Current User
Review Source

"A Great Tool if your Use Case Fits"

What do you like best?

The ability to capture all data in your own ecosystem. The ability to sift through that data and find business insights. In terms of hardware and software cost it is very cheap (especially with hosted cloud services).

What do you dislike?

As with all Hadoop tools lots of knobs to tweak. Takes a good bit of time optimize and finely tune your Hadoop install.

What business problems are you solving with the product? What benefits have you realized?

Massive data collection, storage and analytics. It is extremely cheap to get this up and running.

Hadoop HDFS review by Brittany D.
Brittany D.
Validated Reviewer
Review Source

"Hadoop "

What do you like best?

It is easy to use, and has a simple interface.

What do you dislike?

Depending on the structure and size of your tables it can take a long time for queries to run, and it doesn't give you much insight into the length of time it might take.

Recommendations to others considering the product

I find it outdated compared to other tools, but it is fine. It gets the job done.

What business problems are you solving with the product? What benefits have you realized?

Providing business intelligence answers using simple or complex queries that run against your databases.

Hadoop HDFS review by User in Computer Software
User in Computer Software
Validated Reviewer
Review Source

"HDFS is effective for long time storage"

What do you like best?

it is effective for long time storage, and easily scalable.

What do you dislike?

Writing to HDFS system is a little bit slow.

Recommendations to others considering the product

Most effective for data warehouse as it is cheap and easily scalable.

What business problems are you solving with the product? What benefits have you realized?

Use HDFS as data warehouse. Reliable for long time storage.

Hadoop HDFS review by User in Internet
User in Internet
Validated Reviewer
Review Source

"Hadoop for a trail user"

What do you like best?

Distribute data and computation.The computation local to data prevents the network overload.

What do you dislike?

Programming model is very restrictive:- Lack of central data can be preventive.

Recommendations to others considering the product

Cassandra may be a better choice for data analytics tasks

What business problems are you solving with the product? What benefits have you realized?

Using hadoop for data accumulation and event generation related tasks

Hadoop HDFS review by User in Higher Education
User in Higher Education
Validated Reviewer
Review Source

"Hadoop Review"

What do you like best?

Apache Hadoop is powerful and easy to use.

What do you dislike?

Hadoop can be slow, Spark offers multiple advantages.

Recommendations to others considering the product

Think about the speed you need and your goals with using the product. Spark may be a better bet.

What business problems are you solving with the product? What benefits have you realized?

Handling big data on customers with ease

Hadoop HDFS review by User in Information Technology and Services
User in Information Technology and Services
Validated Reviewer
Review Source

"distributed storage of social media data for language analysis"

What do you like best?

Some details of file storage are obscured from user. Built-in redundancy.

What do you dislike?

The learning curve can be steep for new users.

What business problems are you solving with the product? What benefits have you realized?

Need to store tables of language data. We are able to have redundancy and speed, and concurrency hasn't been an issue.

Hadoop HDFS review by User in Insurance
User in Insurance
Validated Reviewer
Review Source

"Hadoop Made me Change my Major"

What do you like best?

Easy to use, easy to read, makes a lot of sense.

What do you dislike?

There is nothing about hadoop I currently don't like

What business problems are you solving with the product? What benefits have you realized?

Using Agile instead of Waterfall

Hadoop HDFS review by User in Computer Software
User in Computer Software
Validated Reviewer
Review Source

"New user to Hadoop Still learning the ropes"

What do you like best?

The high availability and how it interfaces with technologies like Flink and Spark.

What do you dislike?

The high barrier to entry for learning the tools and the software. Also, setting it up for development.

What business problems are you solving with the product? What benefits have you realized?

I am trying to create an Information Retrieval engine.

Hadoop HDFS review by Consultant in Banking
Consultant in Banking
Validated Reviewer
Review Source

"Business intelligence with hadoop"

What do you like best?

I like that hadoop is easy to use and simple to build other software on top

What do you dislike?

I dislike the difficult to setup and implement some function

What business problems are you solving with the product? What benefits have you realized?

A system for business intelligence with marketing data e analytics

Hadoop HDFS review by User in Market Research
User in Market Research
Validated Reviewer
Review Source

"Hadoop is great if you know how"

What do you like best?

Map reduce is an amazing way to move large set of data around

What do you dislike?

Hadoop is very complicated compared to Spark which is evolving rapidly.Complex syntax and deployment

What business problems are you solving with the product? What benefits have you realized?

Moving a lot of data around

Hadoop HDFS review by User in Management Consulting
User in Management Consulting
Validated Reviewer
Review Source

"Great big data management system"

What do you like best?

Fantastic system for working with big data

What do you dislike?

No problems with the software as of yet.

Recommendations to others considering the product

Great big data software

What business problems are you solving with the product? What benefits have you realized?

Solving data management issues and allows us to work with large datasets

Hadoop HDFS review by User in Human Resources
User in Human Resources
Validated Reviewer
Review Source

"I use it on a daily bases."

What do you like best?

The distributed nature of the data storage.

What do you dislike?

The amount of the administrative overhead required to maintain.

Recommendations to others considering the product

The de facto standard in storing data distributively.

What business problems are you solving with the product? What benefits have you realized?

Storing documents required to analyze.

Kate from G2 Crowd

Learning about Hadoop HDFS?

I can help.
* We monitor all Hadoop HDFS reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. Validated reviews require the user to submit a screenshot of the product containing their user ID, in order to verify a user is an actual user of the product.