mapreduce
1, Combiner Concept: Key ReduceTask to perform the task of summary calculation. Let the MapTask side perform it first effect: For example: you are the head of a kindergarten. Now you need to count the number of children from different cities in the kindergarten. In the past, every head teacher UTF-8...
Posted by freakonaleash on Tue, 06 Jul 2021 07:35:47 +0930
Manual anti crawler, no Reprint: Original blog address https://blog.csdn.net/lys_828/article/details/118993512 (CSDN blogger: be)_ melting) It's not easy to sort out knowledge. Please respect the achievements of labor. The article is only published in CSDN Website, in other websites to see theUTF-8...
Posted by RoBoTTo on Fri, 23 Jul 2021 02:26:00 +0930
Introduce two methods to realize this case Mode 1: Hadoop has its own mapreduce program. What is mapreduce? There is a detailed explanation in the Hadoop column of my blog, which will not be repeated here. It is mainly to complete this small case. 1. Find the directory where the file is locatedUTF-8...
Posted by don117 on Thu, 16 Dec 2021 02:06:38 +1030
shuffle is very important. It is the focus of tuning and the killer of performance Unoptimized shuffle: (picture source: Beifeng network) The unoptimized shuffle has two characteristics: In the early versions of spark, shuffleMapTask flushes all data to disk only after writing it to the bucket UTF-8...
Posted by saadatshah on Fri, 17 Dec 2021 06:02:12 +1030
Whether the data quality is good or bad is directly related to the correctness of our final data analysis results. If we want to ensure the high quality of data, we need to clean the data. Cleaning has two functions: 1. Clean the data with poor data quality and filter out illegal data 2. ConverUTF-8...
Posted by lorddraco98 on Fri, 17 Dec 2021 14:17:17 +1030
Prepared by MapReduce of Hadoop To write the routine of MapReduce, first write the map and reduce functions, and debug a small part of the dataset in the IDE, Write unit test (MRUnit). After successful debugging, release the program to the cluster environment. During this period, you may encouUTF-8...
Posted by hayw0027 on Sat, 18 Dec 2021 04:25:07 +1030
Collaborative filtering algorithm: The article based collaborative filtering algorithm mainly has two steps: 1. Calculate the similarity between items: the similarity between items can be calculated according to the co-occurrence times, cosine angle and Euclidean distance. 2. Generate a recommeUTF-8...
Posted by raquelzinha on Mon, 03 Jan 2022 13:25:36 +1030
1, Requirement description Hadoop comprehensive operation requirements: 1. Upload the files to be analyzed (no less than 10000 English words) to HDFS. 2. Call MapReduce to count the occurrence times of each word in the file. 3. Download the statistical results locally. 4. Write a blog to descriUTF-8...
Posted by goldenei on Mon, 03 Jan 2022 16:46:34 +1030
Personal learning and sorting, all materials are from Shangsi valley Station B learning connection: Add link description MapReduce framework principle - InputFormat data input 1.1 InputFormat data input 1.1.1 parallelism determination mechanism between slice and MapTask Problem elicitation The UTF-8...
Posted by blufish on Thu, 06 Jan 2022 00:15:49 +1030
Using distributed concurrent computing power to realize machine learning algorithm is an important direction in the field of AI practice, because for AI computing with massive data, the ability of single machine is often seriously insufficient. It is OK to do some experiments on your own machinUTF-8...
Posted by alexszilagyi on Thu, 20 Jan 2022 15:39:53 +1030