Hadoop HDFS

(78)
4.3 out of 5 stars

Hadoop HDFS is a distributed, scalable, and portable filesystem written in Java.

Work for Hadoop HDFS?

Learning about Hadoop HDFS?

We can help you find the solution that fits you best.

Find the Right Product

Hadoop HDFS Reviews

Write a Review
Filter Reviews
Filter Reviews
  • Ratings
  • Company Size
  • User Role
  • Industry
Ratings
Company Size
User Role
Industry
Showing 82 Hadoop HDFS reviews
LinkedIn Connections
Hadoop HDFS review by <span>Akash S.</span>
Akash S.
Validated Reviewer
Verified Current User
Review Source

"Hadoop Ecosystem- THE BEST FOR BIG DATA ANALYTICS."

What do you like best?

It is a Hadoop Distributed File System which is made totally in Java It provides High Scalability and Redundancy. It Stores Both Structured as well as Unstructured Data and it also provides a really fast Data retrieval time. Hadoop has a large Community besides it and so many problems are solved instantly.

What do you dislike?

It is not a Useful tool for the Beginners who want to make A career in Big Data Analtics. MongoDB is still more easy to use and there are many tutorials also available to learn MongoDB. Sometimes Due to Local Hosting of the Tool it lacks sometimes. The UI is also not up to the Mark.

Recommendations to others considering the product

From startups to enterprises, for the modern and the Best Softwares or Analysis of Data use Hadoop HDFS. It allows you to easily store the Structured as well as Unstructured Data and Will allow the Users to Retrive it Easily for Further uses of the Data. Hadoop has a large amount of Softwares which supports various functionalities in BIg Data Analytics Like FLUME, HIVE, PIG-LATIN, MAHOUT, etc. Machine Learning can also be done on the Big Data in this Hadoop Ecosystem.

What business problems are you solving with the product? What benefits have you realized?

Hadoop HDFS as the Name suggests (Hadoop Distributed File System) allows users to Store any Large amount of Data which then can be easily used for analysis. My Business Analytics are all done through Hadoop which helps the Business to grow.

Sign in to G2 Crowd to see what your connections have to say about Hadoop HDFS
Hadoop HDFS review by <span>Prabhudayal A.</span>
Prabhudayal A.
Validated Reviewer
Verified Current User
Review Source

"HDFS- New era File System"

What do you like best?

1.Storing of file in sequential format, using key value pair.- Stores file as key and content as value and encrypts them.

2.128 mb block size.- Previously it was 64 which was less. But still some people like the old block size.

3.storing multiple copies of data- Store same data in multiple nodes. It helps in case there is a failure of single node. As we all know sole purpose of hdfs is using commodity hardware and make the service available.

4. Horizontal scaling and distributed architecture- This helps the system to grow without worrying about previous data. You can increase the number of nodes in case of a drastic change in the amount of data.

What do you dislike?

1.Map reduce jobs takes much time for smaller amount of data too. As the map and reduce jobs are getting created for any job.- Its recommended to use relational database in case of small amount of data in range of GB

2.The immutable nature- Files in HDFS can not be altered-more specifically it can not be modified. Append is allowed though.

Recommendations to others considering the product

yes, it is recommended for bulk data

What business problems are you solving with the product? What benefits have you realized?

We have developed module to push .eml files to HDFS and store it in sequential file.

The retrieval of data is much faster as compared to traditional file system.

What Big Data Processing and Distribution solution do you use?

Thanks for letting us know!
Hadoop HDFS review by <span>Roshan M.</span>
Roshan M.
Validated Reviewer
Verified Current User
Review Source

"Big Data Storage"

What do you like best?

HDFS is fault tolerant as it makes multiple replica of the data which is stored in it, thus making it more reliable. Also when compared with traditional file systems, it is much robust and efficient in working on bulk amount of data, as it process via map reduce in the back-end. Also one of the major advantages is that HDFS can easily run on commodity hardware, making the initial setup cost very low. The commands used for HDFS to manage the files are almost the same as used in shell, providing an ease for the same

What do you dislike?

One of the major drawbacks of Hadoop file system is that is the amount of data to be dealt with is less, it won't be efficient, as the time for processing small data is equivalent to time taken for bulk data.

By default, the security measures in Hadoop File system are disabled, making it insecure for data storage.

Recommendations to others considering the product

For those who are dealing with bulk amount of data, can surely go for hadoop file system, having offline applications, but those dealing with real time small amount of data, this is not recommended.

What business problems are you solving with the product? What benefits have you realized?

We are moving data from traditional file system to, hadoop storage, as the amount of data is increasing day by day, so we require a platform to manage this bulk data in a better manner, robust and cost efficient manner.

Thus HDFS acts as the base storage for storing historic records

Hadoop HDFS review by Administrator
Administrator
Validated Reviewer
Verified Current User
Review Source

"Brilliant for data mining across large cache rdbms"

What do you like best?

Hadoop can take loads of data from disparate sources quickly and performs well under testing performance conditions with multi-server configurations.

Hadoop is customizable so that nearly and most of our business objectives can be justified with the right combination of data and reports.

Very scalable product for infinite number of rows and large number of parallel processors through dynamic clustering. THe product is also very economical in comparison to SAS.

What do you dislike?

Less organizational support system. Bugs that need help outside help take a long time to get pushed as an update.

Does not come with too much business knowledge hence the container needs a lot of programming to make usable for a specific use case.

Recommendations to others considering the product

use if you have a heavy dataset (most other solutions will not work in this case). it is economical in the medium to long term although the implementation is higher than for most other cloud-based solutions.

What business problems are you solving with the product? What benefits have you realized?

using hdfs for storage and processing.

intermediate data filtering as a middleware

Hadoop HDFS review by <span>swetha m.</span>
swetha m.
Validated Reviewer
Review Source

"Data Lake HDFS"

What do you like best?

1) Distributed File System helps in partitioning huge data into multiple machines which helps in storing peta data,it follows write once and follows WORM-write ones read many times.

2) We have master node to distribute data among nodes and maintain the metadata of file version and path in it which is easier to spot the files.

3) Data Loss- As data is stored in multiple data nodes, there is a replication in case of any failure and very less chance to lose data.

4) Reading, copying, moving files to HDFS using putty commands is easier.

5) Apache ambari provides the user interface for Hadoop eco-systems which helps us to download,copy,rename,move and change permissions to directory and files in HDFS more easier.

6) Use of checksum for data integrity helps to check corruption of data.

What do you dislike?

1) Failure in namenode has no replication which takes lot of time to recover.

2) As Block size has a limit in size,storing small files is not efficient.

3) It doesn't allow multiple users to write to a file.

Recommendations to others considering the product

1) HDFS is a filesystem which has huge memory and can store your files in a distributed manner in multiple network machines.

2) Follows Write ones and read multiple times slogan along with replication of data in data nodes.

3) we have master node to distribute data among nodes and maintain the metadata of file version and path in it which is easier to spot the file

What business problems are you solving with the product? What benefits have you realized?

1) We are able to load peta bytes of metadata to Hbase using map reduce programs by creating H file in HDFS

2) We are able to scheedule our jobs by keeping the relevant files in HDFS by oozie yarn user.

3) We are able to store both content(any flat file) and metadata in the form of H file in HDFS and finally load to Hbase.

4) We are storing logs in HDFS for the date which keeps track of the job

5) We run purging module to delete files from HDFS ones its loaded to HBASE

Hadoop HDFS review by <span>Rupesh A.</span>
Rupesh A.
Validated Reviewer
Review Source

"Hadoop distributed file system review"

What do you like best?

Hadoop distributed file system is a distributed,scalable,fault tolerant and very efficient data storage platform. This is used to store data and can be used to support data processing frameworks like mapreduce and Spark. The best thing about hdfs is that it can be used by multiple things to create a solution. Best thing about hdfs and hadoop framework is that for training purposes we can even create single node cluster in our laptop.

What do you dislike?

There are not much to dislike but speed is reduced when we deal with small files. if there are lot of small files to save then name node will be under pressure for saving the entry of those files. metadata will increase and hence performance will decrease.

Recommendations to others considering the product

HDFS is a must use solution as we have this as a complete storage solution and for now we have not found anything which can replace it as a storage solution. This can be used with spark as well as mapreduce for real time analysis as well as batch processing. we can store data by different compression techniques that is also a very good thing.

What business problems are you solving with the product? What benefits have you realized?

we are using hdfs as storage solution for large data which is got from legacy system, we use it with map reduce and spark framework to do some analysis of data. we use hbase on top of it and Also apache phoenix. it serves as storage solution by our map reduce programs to save intermediate and final outputs.

Kate from G2 Crowd

Learning about Hadoop HDFS?

I can help.
* We monitor all Hadoop HDFS reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. Validated reviews require the user to submit a screenshot of the product containing their user ID, in order to verify a user is an actual user of the product.