What is Apache Big Data
Apache Hadoop: distributed storage architecture for large amounts of data
Hadoop Distributed File System (HDFS)
HDFS is a highly available file system, which is used to store large amounts of data in a computer cluster and is therefore responsible for data management within the framework. For this purpose, files are broken down into data blocks and distributed redundantly to different nodes without an organization scheme. According to the developers, HDFS is able to manage a number of files in the hundreds of millions. Both the length of the file blocks and the degree of redundancy can be configured individually.
The Hadoop cluster basically works according to the Master-slave principle. The architecture of the framework thus consists of a master node to which a large number of nodes are subordinate as slaves. This principle is also reflected in the structure of the HDFS, which is based on aNameNode and various subordinates DataNodes based. The NameNode manages all metadata on the file system, directory structures and files. The actual data storage takes place on the subordinate DataNotes. In order to minimize data loss, files are broken down into individual blocks and stored several times on different nodes. The standard configuration provides that each data block is available in triplicate.
Each DataNode sends the NameNode a sign of life, the so-called Heartbeat. If this signal is missing, the NameNote declares the respective slave to be “dead” and uses the data copies on other nodes to ensure that enough copies of the data blocks concerned are available in the cluster despite the failure. The NameNode thus has a central role within the framework. So that this does not become a "single point of failure", it is common practice to have this master node SecondaryNameNode to the side, which records all changes regarding the metadata and thus enables a restoration of the central control instance.
In the transition from Hadoop 1 to Hadoop 2, HDFS was expanded to include additional backup systems: NameNode HA (High Availability) supplements the system with automatic failover, which automatically starts a replacement component in the event of a NameNode failure. A snapshot function also enables the system to be restored to an earlier state. In addition, the Federation extension enables multiple NameNodes to be managed within a cluster.
- How much should an architecture rendering cost
- What are the greatest war films
- Why are energy conversions so inefficient
- What song from the 1980s is haunting you
- How do I make a homemade stromboli
- How do I make a carpet white
- Why is a nematode called a pseudocoeloma
- What kind of horse racing is best
- Which Muslim countries do not pray?
- What does bhesal mean in marathi
- Esperanto could seriously become the lingua franca?
- How can the surface of the ocean curl
- What is Canadian whiskey
- Which technical branch is suitable for robotics
- Congress will override Trump's budget veto
- What leaders manipulate their society today
- Is education an asset?
- What if Eren Jaeger didn't exist?
- Which engineering school is best for 2019
- Should you pay for the divination
- Which word is used instead of dint
- What is a microbial cloud
- How do I cure a dry penis
- What is hydraulic and pneumatic machines