diff --git a/assignment-3/handout/README.md b/assignment-3/handout/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..e086659bd86f27b6eab9ffb5684e76f50641dde5
--- /dev/null
+++ b/assignment-3/handout/README.md
@@ -0,0 +1,43 @@
+# Assignment 3. 聚类算法
+
+- **姓名:高庆麾**
+- **学号:19307130062**
+
+
+
+## 第〇部分 算法实现介绍
+
+*对 K-Means 和 GMM 算法在代码中已经包含比较详细的注释,对大部分实现过程和细节都在相应位置做了标注和说明,请参看代码。
+
+
+
+## 第一部分 自动聚类算法实验
+
+生成数据组数 $n = 500$,维数 $dim = 2$ ,生成采用的高斯分布标准差为 $1$ ,中心从 $[0,\ 1]$ 中随机生成,但做了乘 $10$ 倍放大的处理,以增大间距。
+
+聚类数依次为 $3,\ 5,\ 7,\ 9,\ 11,\ 13$ 时的结果如下:
+
+




 +
+如果仅通过肉眼观察,可以发现 elbow method 配合自行定义的一套选择拐点的方法(详细介绍请见代码对应部分注释)是非常符合人工观察的预期的,但是数据生成的聚类确实比较不规则,所在的位置在距离和上难以体现出来。
+
+
+
+### 第二部分 对聚类结果的可视化实验
+
+生成数据组数 $n = 500$,维数 $dim = 2$ ,生成采用的高斯分布标准差为 $1$ ,中心从 $[0,\ 1]$ 中随机生成,但做了乘 $10$ 倍放大的处理,以增大间距。
+
+KMeans 算法在迭代次数 $20$, 聚类数依次为 $5,\ 9,\ 13$ 时的结果如下:
+
+
+如果仅通过肉眼观察,可以发现 elbow method 配合自行定义的一套选择拐点的方法(详细介绍请见代码对应部分注释)是非常符合人工观察的预期的,但是数据生成的聚类确实比较不规则,所在的位置在距离和上难以体现出来。
+
+
+
+### 第二部分 对聚类结果的可视化实验
+
+生成数据组数 $n = 500$,维数 $dim = 2$ ,生成采用的高斯分布标准差为 $1$ ,中心从 $[0,\ 1]$ 中随机生成,但做了乘 $10$ 倍放大的处理,以增大间距。
+
+KMeans 算法在迭代次数 $20$, 聚类数依次为 $5,\ 9,\ 13$ 时的结果如下:
+

 +
+可以发现,在聚类数增多时,由于数据生成过于混杂,KMeans 给出了从肉眼观察上看更优的结果,尽管这可能和原始数据的生成方式有所区别。
+
+
+
+如果我们适当增大各个聚类的间距,GMM 算法在迭代次数 $20$, 聚类数依次为 $5,\ 9,\ 13$ 时的结果如下:
+
+
+可以发现,在聚类数增多时,由于数据生成过于混杂,KMeans 给出了从肉眼观察上看更优的结果,尽管这可能和原始数据的生成方式有所区别。
+
+
+
+如果我们适当增大各个聚类的间距,GMM 算法在迭代次数 $20$, 聚类数依次为 $5,\ 9,\ 13$ 时的结果如下:
+

 +
+可以看到很多时候,各个高斯分布的均值容易聚到一起,通过对算法的逻辑进行思考和分析,算法应该确实存在这方面的问题。
+
+
+
diff --git a/assignment-3/submission/19307130062/README.md b/assignment-3/submission/19307130062/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0dc22b88270f3e935c6160b60b0443b98021e005
--- /dev/null
+++ b/assignment-3/submission/19307130062/README.md
@@ -0,0 +1,45 @@
+# Assignment 3. 聚类算法
+
+- **姓名:高庆麾**
+- **学号:19307130062**
+
+
+
+## 第〇部分 算法实现介绍
+
+*对 K-Means 和 GMM 算法在代码中已经包含比较详细的注释,对大部分实现过程和细节都在相应位置做了标注和说明,请参看代码。
+
+实验实现了支持多维数据(大于两维)的高维 GMM 算法。详细介绍也请参看代码。
+
+
+
+## 第一部分 自动聚类算法实验
+
+生成数据组数 $n = 500$,维数 $dim = 2$ ,生成采用的高斯分布标准差为 $1$ ,中心从 $[0,\ 1]$ 中随机生成,但做了乘 $10$ 倍放大的处理,以增大间距。
+
+聚类数依次为 $3,\ 5,\ 7,\ 9,\ 11,\ 13$ 时的结果如下:
+
+
+
+可以看到很多时候,各个高斯分布的均值容易聚到一起,通过对算法的逻辑进行思考和分析,算法应该确实存在这方面的问题。
+
+
+
diff --git a/assignment-3/submission/19307130062/README.md b/assignment-3/submission/19307130062/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..0dc22b88270f3e935c6160b60b0443b98021e005
--- /dev/null
+++ b/assignment-3/submission/19307130062/README.md
@@ -0,0 +1,45 @@
+# Assignment 3. 聚类算法
+
+- **姓名:高庆麾**
+- **学号:19307130062**
+
+
+
+## 第〇部分 算法实现介绍
+
+*对 K-Means 和 GMM 算法在代码中已经包含比较详细的注释,对大部分实现过程和细节都在相应位置做了标注和说明,请参看代码。
+
+实验实现了支持多维数据(大于两维)的高维 GMM 算法。详细介绍也请参看代码。
+
+
+
+## 第一部分 自动聚类算法实验
+
+生成数据组数 $n = 500$,维数 $dim = 2$ ,生成采用的高斯分布标准差为 $1$ ,中心从 $[0,\ 1]$ 中随机生成,但做了乘 $10$ 倍放大的处理,以增大间距。
+
+聚类数依次为 $3,\ 5,\ 7,\ 9,\ 11,\ 13$ 时的结果如下:
+
+




 +
+如果仅通过肉眼观察,可以发现 elbow method 配合自行定义的一套选择拐点的方法(详细介绍请见代码对应部分注释)是非常符合人工观察的预期的,但是数据生成的聚类确实比较不规则,所在的位置在距离和上难以体现出来。
+
+
+
+## 第二部分 对聚类结果的可视化实验
+
+生成数据组数 $n = 500$,维数 $dim = 2$ ,生成采用的高斯分布标准差为 $1$ ,中心从 $[0,\ 1]$ 中随机生成,但做了乘 $10$ 倍放大的处理,以增大间距。
+
+KMeans 算法在迭代次数 $20$, 聚类数依次为 $5,\ 9,\ 13$ 时的结果如下:
+
+
+如果仅通过肉眼观察,可以发现 elbow method 配合自行定义的一套选择拐点的方法(详细介绍请见代码对应部分注释)是非常符合人工观察的预期的,但是数据生成的聚类确实比较不规则,所在的位置在距离和上难以体现出来。
+
+
+
+## 第二部分 对聚类结果的可视化实验
+
+生成数据组数 $n = 500$,维数 $dim = 2$ ,生成采用的高斯分布标准差为 $1$ ,中心从 $[0,\ 1]$ 中随机生成,但做了乘 $10$ 倍放大的处理,以增大间距。
+
+KMeans 算法在迭代次数 $20$, 聚类数依次为 $5,\ 9,\ 13$ 时的结果如下:
+

 +
+可以发现,在聚类数增多时,由于数据生成过于混杂,KMeans 给出了从肉眼观察上看更优的结果,尽管这可能和原始数据的生成方式有所区别。
+
+
+
+如果我们适当增大各个聚类的间距,GMM 算法在迭代次数 $20$, 聚类数依次为 $5,\ 9,\ 13$ 时的结果如下:
+
+
+可以发现,在聚类数增多时,由于数据生成过于混杂,KMeans 给出了从肉眼观察上看更优的结果,尽管这可能和原始数据的生成方式有所区别。
+
+
+
+如果我们适当增大各个聚类的间距,GMM 算法在迭代次数 $20$, 聚类数依次为 $5,\ 9,\ 13$ 时的结果如下:
+

 +
+可以看到很多时候,各个高斯分布的均值容易聚到一起,通过对算法的逻辑进行思考和分析,算法应该确实存在这方面的问题。
+
+
+
diff --git a/assignment-3/submission/19307130062/img/acgtcnevvn 500, 2, 9.png b/assignment-3/submission/19307130062/img/acgtcnevvn 500, 2, 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..ccd4f133638733bc11b883b1c4efb28c973cfcd0
Binary files /dev/null and b/assignment-3/submission/19307130062/img/acgtcnevvn 500, 2, 9.png differ
diff --git a/assignment-3/submission/19307130062/img/boqdtamgmq Elbow Method ALL15 500, 2, 9.png b/assignment-3/submission/19307130062/img/boqdtamgmq Elbow Method ALL15 500, 2, 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..1703390178b74d5edfb1b6c91fcd2be15c73d2a6
Binary files /dev/null and b/assignment-3/submission/19307130062/img/boqdtamgmq Elbow Method ALL15 500, 2, 9.png differ
diff --git a/assignment-3/submission/19307130062/img/heyhertslr Elbow Method ALL15 500, 2, 11.png b/assignment-3/submission/19307130062/img/heyhertslr Elbow Method ALL15 500, 2, 11.png
new file mode 100644
index 0000000000000000000000000000000000000000..fb5190b7c07a13650459efc2310e3fdd4af99601
Binary files /dev/null and b/assignment-3/submission/19307130062/img/heyhertslr Elbow Method ALL15 500, 2, 11.png differ
diff --git a/assignment-3/submission/19307130062/img/hkmxtjeiaj Elbow Method ALL15 500, 2, 5.png b/assignment-3/submission/19307130062/img/hkmxtjeiaj Elbow Method ALL15 500, 2, 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..c9ff4a6a358650703bf75f4ae2060a9614ea620e
Binary files /dev/null and b/assignment-3/submission/19307130062/img/hkmxtjeiaj Elbow Method ALL15 500, 2, 5.png differ
diff --git a/assignment-3/submission/19307130062/img/maddmqffwl Elbow Method ALL15 500, 2, 13.png b/assignment-3/submission/19307130062/img/maddmqffwl Elbow Method ALL15 500, 2, 13.png
new file mode 100644
index 0000000000000000000000000000000000000000..6013b0c9b5fd5f5e196f4f1185cba797c0302fd4
Binary files /dev/null and b/assignment-3/submission/19307130062/img/maddmqffwl Elbow Method ALL15 500, 2, 13.png differ
diff --git a/assignment-3/submission/19307130062/img/ndrdqqpmea Elbow Method ALL15 500, 2, 7.png b/assignment-3/submission/19307130062/img/ndrdqqpmea Elbow Method ALL15 500, 2, 7.png
new file mode 100644
index 0000000000000000000000000000000000000000..8cda99abd6d831bce4fff3af8394eb04d13df80f
Binary files /dev/null and b/assignment-3/submission/19307130062/img/ndrdqqpmea Elbow Method ALL15 500, 2, 7.png differ
diff --git a/assignment-3/submission/19307130062/img/onaixzshct 500, 2, 13.png b/assignment-3/submission/19307130062/img/onaixzshct 500, 2, 13.png
new file mode 100644
index 0000000000000000000000000000000000000000..dc6d5517c5e3011fd8ecab17756a6522ebb4a4b3
Binary files /dev/null and b/assignment-3/submission/19307130062/img/onaixzshct 500, 2, 13.png differ
diff --git a/assignment-3/submission/19307130062/img/pbtjfkfwjw 500, 2, 13.png b/assignment-3/submission/19307130062/img/pbtjfkfwjw 500, 2, 13.png
new file mode 100644
index 0000000000000000000000000000000000000000..dc61f97062a45b7458efa5426d9e2c3507c5fd8c
Binary files /dev/null and b/assignment-3/submission/19307130062/img/pbtjfkfwjw 500, 2, 13.png differ
diff --git a/assignment-3/submission/19307130062/img/uvqnndbvpu 500, 2, 9.png b/assignment-3/submission/19307130062/img/uvqnndbvpu 500, 2, 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..927d6bb4a872518e9e45bbaf8279535f841364c0
Binary files /dev/null and b/assignment-3/submission/19307130062/img/uvqnndbvpu 500, 2, 9.png differ
diff --git a/assignment-3/submission/19307130062/img/uxcngpxjhl 500, 2, 5.png b/assignment-3/submission/19307130062/img/uxcngpxjhl 500, 2, 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..a69e8952e596f1e116461a938f1d0e0c54397ad4
Binary files /dev/null and b/assignment-3/submission/19307130062/img/uxcngpxjhl 500, 2, 5.png differ
diff --git a/assignment-3/submission/19307130062/img/vhklkqwtkz 500, 2, 5.png b/assignment-3/submission/19307130062/img/vhklkqwtkz 500, 2, 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..bc886e28b1c714d7f7a84d32cb7f0a0eaf9de361
Binary files /dev/null and b/assignment-3/submission/19307130062/img/vhklkqwtkz 500, 2, 5.png differ
diff --git a/assignment-3/submission/19307130062/img/wishwtfbzz 500, 2, 9.png b/assignment-3/submission/19307130062/img/wishwtfbzz 500, 2, 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..dc82075241af36966c6e0e73228fe28e0455a078
Binary files /dev/null and b/assignment-3/submission/19307130062/img/wishwtfbzz 500, 2, 9.png differ
diff --git a/assignment-3/submission/19307130062/img/wuxvkrvwya 500, 2, 13.png b/assignment-3/submission/19307130062/img/wuxvkrvwya 500, 2, 13.png
new file mode 100644
index 0000000000000000000000000000000000000000..8ba9029afec51a0b3be7804f4ec6dc2421c5521b
Binary files /dev/null and b/assignment-3/submission/19307130062/img/wuxvkrvwya 500, 2, 13.png differ
diff --git a/assignment-3/submission/19307130062/img/wyrclifhqj 500, 2, 5.png b/assignment-3/submission/19307130062/img/wyrclifhqj 500, 2, 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..de70dc270b7fa94f8ba14326119526d9ccaccf83
Binary files /dev/null and b/assignment-3/submission/19307130062/img/wyrclifhqj 500, 2, 5.png differ
diff --git a/assignment-3/submission/19307130062/img/xhynyscjpi Elbow Method ALL15 500, 2, 3.png b/assignment-3/submission/19307130062/img/xhynyscjpi Elbow Method ALL15 500, 2, 3.png
new file mode 100644
index 0000000000000000000000000000000000000000..d45b29105396db5fc6ee6942a5c8245434327ec8
Binary files /dev/null and b/assignment-3/submission/19307130062/img/xhynyscjpi Elbow Method ALL15 500, 2, 3.png differ
diff --git a/assignment-3/submission/19307130062/source.py b/assignment-3/submission/19307130062/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..245ced90da1ec2939f98772a5013cb57bf1c4c1e
--- /dev/null
+++ b/assignment-3/submission/19307130062/source.py
@@ -0,0 +1,456 @@
+import numpy as np
+
+def distance(x, y, type = "euclidean", lp = -1.): 
+    """ 
+        支持多样化距离的计算,包括欧几里得距离、曼哈顿距离、切比雪夫距离
+        以及更一般的 Lp 范数(要求 p 为正实数,且不超过 100),操作不合法时
+        返回 0.
+    """ 
+    
+    x = x.astype(np.float64)
+    y = y.astype(np.float64)
+    
+    if type == "euclidean": 
+        return np.sqrt(sum((x - y)**2))
+        
+    elif type == "manhattan": 
+        return sum(abs(x - y))
+        
+    elif type == "chebyshev": 
+        return max(abs(x - y))
+        
+    elif type == "Lp":
+        if 0 < lp <= 100.: 
+            return sum((abs(x - y)) ** lp) ** (1. / lp)
+        else: 
+            print("Error: Lp-norm is illegal.")
+            return 0. 
+            
+    else: 
+        print("Error: can't identify the type of distance.")
+        return 0.
+        
+        
+def preprocess(x, type = "min_max"): 
+    """
+        支持 Min-Max 或 Z-score 方法的数据归一化预处理
+    """
+    
+    x = x.astype(np.float64)
+
+    if type == "min_max": 
+        for dim in range(x.shape[1]): 
+            dim_min, dim_max = min(x[:, dim]), max(x[:, dim])
+            x[:, dim] = (x[:, dim] - dim_min) / (dim_max - dim_min)
+        return x
+        
+    elif type == "z_score": 
+        for dim in range(x.shape[1]): 
+            std, mu = np.std(x[:, dim]), np.mean(x[:, dim])
+            x[:, dim] = (x[:, dim] - mu) / std
+        return x
+        
+    elif type == "none":
+        return x
+
+class KMeans:
+
+    def __init__(self, n_clusters, preprocess_type = "min_max"): # 默认采用 Min-Max 归一化方法
+        self.K = n_clusters
+        self.preprocess_type = preprocess_type
+    
+    def fit(self, train_data, iter_times = 10, debug = False):
+    
+        train_data = preprocess(train_data, type = self.preprocess_type)
+        n = train_data.shape[0]                             # 训练数据数
+        cluster_size = np.zeros((self.K, 1)).astype(int)    # 每个聚类中的点数,类型转换为整型
+        train_label = np.zeros((n)).astype(int)             # 每个点所属的聚类,类型转换为整型
+    
+        # self.center  = np.random.rand(self.K, train_data.shape[1])
+        self.center = train_data[: self.K] # 取前 K 个样本作为初始聚类中心,应事先打乱数据
+        
+        for T in range(iter_times): 
+        
+            # print("Iteration {0}:\t".format(T + 1), end = '')
+            
+            sum_of_distance = 0.
+        
+            for i in range(n):
+            
+                min_dist = distance(train_data[i], self.center[self.K - 1])
+                min_dist_cluster = self.K - 1
+                
+                for j in range(self.K - 1): 
+                    dis = distance(train_data[i], self.center[j])
+                    if dis < min_dist: 
+                        min_dist = dis
+                        min_dist_cluster = j
+                        
+                train_label[i] = min_dist_cluster
+                sum_of_distance += min_dist 
+                # 求距离每个点最近的聚类中心
+                
+            self.center.fill(0)
+            cluster_size.fill(0)
+            
+            for i in range(n): 
+                self.center[train_label[i]] += train_data[i]
+                cluster_size[train_label[i]] += 1
+            
+            for j in range(self.K): 
+                if cluster_size[j] == 0: 
+                    self.center[j] = np.random.rand(train_data.shape[1]) 
+                    # 如果该聚类中无任何数据点,则随机一个新的聚类中心
+                else: 
+                    self.center[j] /= cluster_size[j] 
+                    # 更新新的聚类中心
+            
+                
+            # print("{0}".format(sum_of_distance))
+            
+        if debug: 
+            return self.center, sum_of_distance
+    
+    def predict(self, test_data):
+    
+        test_data = preprocess(test_data, type = self.preprocess_type)
+        
+        n = test_data.shape[0]                  # 测试数据数
+        test_label = np.zeros((n)).astype(int)  # 每个点所属的聚类,类型转换为整型
+        
+        for i in range(n):
+            
+            min_dist = distance(test_data[i], self.center[self.K - 1])
+            min_dist_cluster = self.K - 1
+            
+            for j in range(self.K - 1): 
+                dis = distance(test_data[i], self.center[j])
+                if dis < min_dist: 
+                    min_dist = dis
+                    min_dist_cluster = j
+                    
+            test_label[i] = min_dist_cluster
+            # 求距离每个点最近的聚类中心
+            
+        sum_of_distance = 0
+        for i in range(n): 
+            sum_of_distance += distance(test_data[i], self.center[test_label[i]])
+        # 计算各点到其聚类中心距离和
+          
+        return test_label
+
+class GaussianMixture:
+
+    def __init__(self, n_clusters, preprocess_type = "min_max"): # 默认采用 Min-Max 归一化方法
+        self.K = n_clusters
+        self.preprocess_type = preprocess_type
+    
+    def fit(self, train_data, iter_times = 20, debug = False):  # 支持多维高斯混合分布模型
+    
+        train_data = preprocess(train_data, self.preprocess_type) 
+    
+        dim = train_data.shape[1]                   # 训练数据维数
+        n   = train_data.shape[0]                   # 训练数据数
+        gamma = np.zeros((n, self.K))               # GMM 中的参数,表达每个点属于每个聚类的可能性大小
+        train_label = np.zeros((n)).astype(int)     # 每个点所属的聚类,类型转换为整型
+        
+        self.mu     = np.random.rand(self.K, dim)               # 高斯分布的均值数组
+        self.cov    = np.array([np.eye(dim)] * self.K) * 0.1    # 高斯分布的协方差矩阵数组
+        self.alpha  = np.array([1.0 / self.K] * self.K)         # 高斯分布的隐变量数组
+        self.epsilon = 1e-10                                    # 用于防止数值问题的常量
+        
+        for T in range(iter_times): 
+            gamma.fill(0)
+        
+            """
+                查阅许多网上资料及 scipy 有关源码后,发现要想快速地(用矩阵乘法的形式)计算多个向量在某个
+                多维高斯分布下的 gamma,需要将 cov 矩阵逆分解为两个互为转置的矩阵的乘积,分别和前后也互为
+                转置关系的 (x - mu) 和 (x - mu)^T 相乘,方能实现批量计算。但上述方法较为复杂,此处采用
+                每次计算一项 gamma 的暴力方法实现。
+                
+                为了尽可能避免数值问题,此处先求出 log(gamma),最后再进行指数计算将其复原。
+            """
+            
+            for k in range(self.K): 
+                tail = np.log(self.alpha[k]) - 0.5 * (dim * np.log(2 * np.pi) + np.log(np.linalg.det(self.cov[k])))
+                # 对含隐变量高斯分布取 log 后舍弃 exp 部分的余项,可以事先计算出来
+                cov_inv = np.linalg.inv(self.cov[k])
+                # 对协方差矩阵求逆
+                
+                for i in range(n): 
+                    dev = train_data[i] - self.mu[k] 
+                    gamma[i, k] = -0.5 * (np.matmul(np.matmul(dev, cov_inv), dev.T)) + tail
+                    # 计算 exp 部分的数值
+
+            gamma = np.exp(gamma)
+            # 通过 exp 还原 gamma
+            gamma /= (gamma.sum(axis = 1).reshape(gamma.shape[0], -1))
+            # 对同一数据点做其在每个聚类上的 gamma 的归一化
+                
+            self.mu  = np.matmul(gamma.T, train_data) / gamma.sum(axis = 0).reshape(self.K, -1)
+            # 更新高斯分布的均值数组,此处的计算利用了广播机制
+            for k in range(self.K): 
+                dev = gamma[:, k].reshape(n, -1) * (train_data - self.mu[k]) #XXX 
+                self.cov[k] = np.matmul(dev.T, train_data - self.mu[k]) / gamma[:, k].sum() #XXX
+                self.cov[k] += np.eye(dim) * self.epsilon # 在对角线加上 \lambda I ,使协方差矩阵保持非奇异性质
+            # 更新高斯分布的协方差矩阵数组
+            self.alpha = gamma.sum(axis = 0) / n
+            # 更新高斯分布的隐变量数组
+            
+        for i in range(n):
+            train_label[i] = np.nonzero((gamma[i, :] == gamma[i, :].max()))[0]
+            # 计算每个点最可能属于的聚类,认为该点即属于此聚类(其实也可以视作分布然后采样得到,但可能会不够稳定)
+            
+        sum_of_distance = 0
+        for i in range(n): 
+            sum_of_distance += distance(train_data[i], self.mu[train_label[i]])
+        # 计算各点到其聚类中心距离和
+        
+        if debug == True: 
+            return self.mu, sum_of_distance
+        
+    
+    def predict(self, test_data):
+    
+        test_data = preprocess(test_data, self.preprocess_type)
+        
+        n   = test_data.shape[0]                # 训练数据数
+        dim = test_data.shape[1]                # 训练数据维数
+        test_label = np.zeros((n)).astype(int)  # 每个点所属的聚类,类型转换为整型
+        gamma = np.zeros((n, self.K))           # GMM 中的参数,表达每个点属于每个聚类的可能性大小
+        
+        for k in range(self.K): 
+            tail = np.log(self.alpha[k]) - 0.5 * (dim * np.log(2 * np.pi) + np.log(np.linalg.det(self.cov[k])))
+            cov_inv = np.linalg.inv(self.cov[k])
+            for i in range(n): 
+                dev = test_data[i] - self.mu[k]
+                gamma[i, k] = -0.5 * (np.matmul(np.matmul(dev.T, cov_inv), dev)) + tail
+        # 类似 fit 过程中 gamma 的计算
+                
+        for i in range(n):
+            test_label[i] = np.nonzero((gamma[i, :] == gamma[i, :].max()))[0]
+        # 计算每个点最可能属于的聚类,认为该点即属于此聚类
+            
+        sum_of_distance = 0
+        for i in range(n): 
+            sum_of_distance += distance(test_data[i], self.mu[test_label[i]])
+        # 计算各点到其聚类中心距离和
+       
+        return test_label
+   
+#----------------- Utils ----------------- #
+import matplotlib.pyplot as plt   
+from matplotlib import ticker
+
+def line_plot_init(n_clusters):
+    fig, ax = plt.subplots()
+    ax.xaxis.set_major_formatter(ticker.ScalarFormatter())  
+    ax.axvline(n_clusters, ls = '-', linewidth = 1.0, c = 'blue')
+    return fig, ax
+
+def line_plot(ax, X, Y, c, label): 
+    ax.plot(X, Y, color = c, linewidth = 1.0, label = label)
+    ax.scatter(X, Y, color = c, s = 4.0, marker = 'o')
+    ax.legend()
+    
+def line_plot_finish(fig, ax, filename):
+    fig.savefig(filename, dpi = 1000) 
+    fig.show()
+    
+def gen_random_filename(l = 10): 
+    s = ''
+    for i in range(l): 
+        s += (chr(np.random.randint(26) + ord('a')))
+    s += ' '
+    return s
+#----------------- Utils ----------------- #
+
+class ClusteringAlgorithm:
+    def __init__(self, algorithm_tag = "ALL", max_clusters = 15, n = 0, dim = 0, n_clusters = 0):
+        """
+            algorithm_tag:      需要尝试的算法,可从 "KMeans", "GMM" 和 "ALL"(全部尝试)中选择
+            max_clusters:       最大聚类数量
+            n, dim, n_clusters: 原始数据的生成参数
+        """
+        
+        self.algorithm_tag = algorithm_tag
+        self.max_clusters = max_clusters
+        self.n, self.dim, self.n_clusters = n, dim, n_clusters
+    
+    def fit(self, train_data):
+        fig, ax = line_plot_init(self.n_clusters)
+        # 初始化 plt ,同时标记出生成原始数据所用的聚类数
+        
+        suffix = str(self.n) + ', ' + str(self.dim) + ', ' + str(self.n_clusters) + ".png" # 在文件名中记录有关参数
+        
+        if self.algorithm_tag == "KMeans" or self.algorithm_tag == "ALL": 
+        
+            sod = np.zeros((self.max_clusters)) # 初始化 sum_of_distance(各点到其聚类中心距离和) 数组
+            
+            for n_clusters in range(1, self.max_clusters): 
+                model = KMeans(n_clusters)
+                center, sod[n_clusters] = model.fit(train_data, iter_times = 20, debug = True)
+            # 
+                
+            line_plot(ax, np.arange(1, self.max_clusters), sod[1: self.max_clusters], "C3", "KMeans")
+            # 对 sum_of_distance 做可视化
+            
+            thres, min_sod, min_sod_K, step = 3, sod[1], 1, sod[1] * 0.1
+            # 分别定义为 观察步数阈值、最小 sum_of_distance、最小 sum_of_distance 对应的聚类数、波动程度
+            for n_clusters in range(2, self.max_clusters): 
+                if sod[n_clusters] < min_sod - step: 
+                    min_sod = sod[n_clusters]
+                    min_sod_K = n_clusters
+                    thres = 3
+                else:
+                    thres -= 1
+                    if thres == 0: 
+                        break
+            """
+                这是自行定义的一套在 elbow method 中寻找拐点的方法,实际测试效果良好,非常符合人工观察结果
+                - 首先,由于仅有一个聚类时的 sum_of_distance 变化一般很大,因此将波动程度 'step' 定义为 sod[1] 的 10% (经验参数)
+                - 随后,我们从前向后扫描 sum_of_distance 数组,要求每次 sum_of_distance 数组相对当前得到的 sum_of_distance 的最小值至少降低波动程度 'step' 的数量。如果符合要求,用这一项更新 sum_of_distance 最小值,以及备选的聚类数,并恢复观察步数阈值 'thres' 数量。否则认为失败,观察步数阈值 'thres' 减少 1
+                - 如果观察步数阈值 'thres' 到达 0,则立刻结束扫描,当前的备选的聚类数即作为算法选择的聚类数,用于之后重新训练和预测
+            """
+                
+            ax.axvline(min_sod_K, ls = '--', linewidth = 1.0, c = 'C3')
+            # 标记出算法认为的最优聚类数量
+            
+            # print(min_sod_K)
+            
+            self.model = KMeans(min_sod_K)
+            self.model.fit(train_data, iter_times = 50) 
+            # 优先选择 K-Means 算法,并提高迭代次数,重新训练
+                
+            
+        if self.algorithm_tag == "GMM" or self.algorithm_tag == "ALL": 
+        
+            sod = np.zeros((self.max_clusters)) # 初始化 sum_of_distance(各点到其聚类中心距离和) 数组
+            
+            for n_clusters in range(1, self.max_clusters): 
+                model = GaussianMixture(n_clusters)
+                center, sod[n_clusters] = model.fit(train_data, iter_times = 20, debug = True)
+                
+            line_plot(ax, np.arange(1, self.max_clusters), sod[1: self.max_clusters], "C1", "GMM")
+            # 对 sum_of_distance 做可视化
+            
+            thres, min_sod, min_sod_K, step = 3, sod[1], 1, sod[1] * 0.1
+            for n_clusters in range(2, self.max_clusters): 
+                if sod[n_clusters] < min_sod - step: 
+                    min_sod = sod[n_clusters]
+                    min_sod_K = n_clusters
+                    thres = 3
+                else:
+                    thres -= 1
+                    if thres == 0: 
+                        break
+            # 该方法作用和流程同上,请见上方详细介绍
+            
+            ax.axvline(min_sod_K, ls = '--', linewidth = 1.0, c = 'C1')
+            # 标记出算法认为的最优聚类数量
+            
+            # print(min_sod_K)
+            
+            if self.algorithm_tag == "GMM": 
+                self.model = GaussianMixture(min_sod_K)
+                self.model.fit(train_data, iter_times = 50)
+                # 优先选择 K-Means 算法。如果明确指定,才使用 GMM 算法。随后提高迭代次数,重新训练。
+            
+        line_plot_finish(fig, ax, gen_random_filename() + "Elbow Method " + self.algorithm_tag + str(self.max_clusters) + ' ' + suffix)
+    
+    def predict(self, test_data):
+        return self.model.predict(test_data)
+        
+        
+import matplotlib.pyplot as plt
+
+def data_plot_init():
+    return plt.subplots(1, 2)
+    
+def data_plot(fig, ax, pos, data, label, center): 
+
+    for i in range(data.shape[0]): 
+        ax[pos].scatter(data[i, 0], data[i, 1], color = "C" + str(label[i]), marker = 'o')
+        # 画出各点聚类结果
+    for i in range(center.shape[0]): 
+        ax[pos].scatter(center[i, 0], center[i, 1], color = "black", marker = 'x')
+        # 画出聚类中心
+        
+def data_plot_finish(fig, filename):
+    fig.savefig(filename, dpi = 1000)
+    # fig.show()
+    
+def test_model(n = 500, dim = 2, n_clusters = 5, type = "GMM"): 
+    """
+        n:              向量数
+        dim:            向量维数
+        n_clusters:     生成数据的高斯分布聚类数
+        type:           需要测试的算法,可从 "KMeans", "GMM" 和 "CA"(自动聚类)中选择
+    """
+
+    center = np.random.rand(n_clusters, dim) * 30.  # 随机生成 n_clusters 个聚类中心,乘一个系数扩大间距
+    train_data = np.zeros((n, dim))                 # 初始化训练数据数组
+    train_label = np.zeros((n)).astype(int)         # 初始化数据标号数组
+    
+    for i in range(n): 
+        train_label[i] = np.random.randint(n_clusters)
+        train_data[i] = np.random.normal(loc = center[train_label[i], :], scale = 1.)
+        # 随机选定第 i 个中心,以此为高斯分布均值,控制标准差为 1 ,采样生成数据
+        
+    idx = np.arange(n)
+    np.random.shuffle(idx)
+    train_label, train_data = train_label[idx], train_data[idx]
+    # 打乱数据顺序
+    
+    fig, ax = data_plot_init()
+    suffix = ' '
+    
+    if dim == 2:
+        suffix = str(n) + ', ' + str(dim) + ', ' + str(n_clusters) + ".png" 
+        data_plot(fig, ax, 0, train_data, train_label, center)
+        # 数据可视化,在文件名中记录有关参数 
+    
+    if type == "KMeans": # 测试 K-Means
+        model = KMeans(n_clusters)
+        _center, sod = model.fit(train_data, debug = True) # 设定 debug 为 True,会返回训练好的聚类中心、各点到其聚类中心距离和
+        _label  = model.predict(train_data)
+        
+        err = 0
+        for i in range(n): 
+            for j in range(i): 
+                if (train_label[i] == train_label[j] and _label[i] != _label[j]) or (train_label[i] != train_label[j] and _label[i] == _label[j]): 
+                    err += 1
+        print("Accuracy: {0}%".format(100 - err / (n * (n - 1) / 2) * 100))
+        # 计算每两对点在原始数据和预测结果中的状态(是否处于同一类)是否一致,以此计算准确率
+        
+        if dim == 2:
+            data_plot(fig, ax, 1, preprocess(train_data), _label, _center)
+            # 对二维数据的聚类结果做可视化
+    
+    elif type == "GMM": # 测试 Gaussian Mixture Model
+        model = GaussianMixture(n_clusters)
+        _center, sod = model.fit(train_data, debug = True) # 设定 debug 为 True,会返回训练好的聚类中心、各点到其聚类中心距离和
+        _label  = model.predict(train_data)
+        
+        err = 0
+        for i in range(n): 
+            for j in range(i): 
+                if (train_label[i] == train_label[j] and _label[i] != _label[j]) or (train_label[i] != train_label[j] and _label[i] == _label[j]): 
+                    err += 1
+        print("Accuracy: {0}%".format(100 - err / (n * (n - 1) / 2) * 100))
+        # 计算每两对点在原始数据和预测结果中的状态(是否处于同一类)是否一致,以此计算准确率
+        
+        if dim == 2:
+            data_plot(fig, ax, 1, preprocess(train_data), _label, _center)
+            # 对二维数据的聚类结果做可视化
+            
+    elif type == "CA": # 测试自动聚类算法
+        model = ClusteringAlgorithm(n = n, dim = dim, n_clusters = n_clusters)
+        model.fit(train_data)
+        model.predict(train_data)
+        
+    if dim == 2: 
+        data_plot_finish(fig, gen_random_filename() + suffix)
+
+# test_model() 
+# 开始测试!
diff --git a/assignment-3/submission/19307130062/tester_demo.py b/assignment-3/submission/19307130062/tester_demo.py
new file mode 100644
index 0000000000000000000000000000000000000000..2eb8ff3e26ad5546412a857fc0bb1e870c08b6de
--- /dev/null
+++ b/assignment-3/submission/19307130062/tester_demo.py
@@ -0,0 +1,112 @@
+import numpy as np
+import sys
+
+from source import KMeans, GaussianMixture
+
+
+def shuffle(*datas):
+    data = np.concatenate(datas)
+    label = np.concatenate([
+        np.ones((d.shape[0],), dtype=int)*i
+        for (i, d) in enumerate(datas)
+    ])
+    N = data.shape[0]
+    idx = np.arange(N)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+    return data, label
+
+
+def data_1():
+    mean = (1, 2)
+    cov = np.array([[73, 0], [0, 22]])
+    x = np.random.multivariate_normal(mean, cov, (800,))
+
+    mean = (16, -5)
+    cov = np.array([[21.2, 0], [0, 32.1]])
+    y = np.random.multivariate_normal(mean, cov, (200,))
+
+    mean = (10, 22)
+    cov = np.array([[10, 5], [5, 10]])
+    z = np.random.multivariate_normal(mean, cov, (1000,))
+
+    data, _ = shuffle(x, y, z)
+    return (data, data), 3
+
+
+def data_2():
+    train_data = np.array([
+        [23, 12, 173, 2134],
+        [99, -12, -126, -31],
+        [55, -145, -123, -342],
+    ])
+    return (train_data, train_data), 2
+
+
+def data_3():
+    train_data = np.array([
+        [23],
+        [-2999],
+        [-2955],
+    ])
+    return (train_data, train_data), 2
+
+
+def test_with_n_clusters(data_fuction, algorithm_class):
+    (train_data, test_data), n_clusters = data_fuction()
+    model = algorithm_class(n_clusters)
+    model.fit(train_data)
+    res = model.predict(test_data)
+    assert len(
+        res.shape) == 1 and res.shape[0] == test_data.shape[0], "shape of result is wrong"
+    return res
+
+
+def testcase_1_1():
+    test_with_n_clusters(data_1, KMeans)
+    return True
+
+
+def testcase_1_2():
+    res = test_with_n_clusters(data_2, KMeans)
+    return res[0] != res[1] and res[1] == res[2]
+
+
+def testcase_2_1():
+    test_with_n_clusters(data_1, GaussianMixture)
+    return True
+
+
+def testcase_2_2():
+    res = test_with_n_clusters(data_3, GaussianMixture)
+    return res[0] != res[1] and res[1] == res[2]
+
+
+def test_all(err_report=False):
+    testcases = [
+        ["KMeans-1", testcase_1_1, 4],
+        ["KMeans-2", testcase_1_2, 4],
+        # ["KMeans-3", testcase_1_3, 4],
+        # ["KMeans-4", testcase_1_4, 4],
+        # ["KMeans-5", testcase_1_5, 4],
+        ["GMM-1", testcase_2_1, 4],
+        ["GMM-2", testcase_2_2, 4],
+        # ["GMM-3", testcase_2_3, 4],
+        # ["GMM-4", testcase_2_4, 4],
+        # ["GMM-5", testcase_2_5, 4],
+    ]
+    sum_score = sum([case[2] for case in testcases])
+    score = 0
+    for case in testcases: 
+        res = case[2] if case[1]() else 0 
+        score += res
+        print("+ {:14} {}/{}".format(case[0], res, case[2]))
+    print("{:16} {}/{}".format("FINAL SCORE", score, sum_score))
+
+
+if __name__ == "__main__":
+    if len(sys.argv) > 1 and sys.argv[1] == "--report":
+        test_all(True)
+    else:
+        test_all()
+
+可以看到很多时候,各个高斯分布的均值容易聚到一起,通过对算法的逻辑进行思考和分析,算法应该确实存在这方面的问题。
+
+
+
diff --git a/assignment-3/submission/19307130062/img/acgtcnevvn 500, 2, 9.png b/assignment-3/submission/19307130062/img/acgtcnevvn 500, 2, 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..ccd4f133638733bc11b883b1c4efb28c973cfcd0
Binary files /dev/null and b/assignment-3/submission/19307130062/img/acgtcnevvn 500, 2, 9.png differ
diff --git a/assignment-3/submission/19307130062/img/boqdtamgmq Elbow Method ALL15 500, 2, 9.png b/assignment-3/submission/19307130062/img/boqdtamgmq Elbow Method ALL15 500, 2, 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..1703390178b74d5edfb1b6c91fcd2be15c73d2a6
Binary files /dev/null and b/assignment-3/submission/19307130062/img/boqdtamgmq Elbow Method ALL15 500, 2, 9.png differ
diff --git a/assignment-3/submission/19307130062/img/heyhertslr Elbow Method ALL15 500, 2, 11.png b/assignment-3/submission/19307130062/img/heyhertslr Elbow Method ALL15 500, 2, 11.png
new file mode 100644
index 0000000000000000000000000000000000000000..fb5190b7c07a13650459efc2310e3fdd4af99601
Binary files /dev/null and b/assignment-3/submission/19307130062/img/heyhertslr Elbow Method ALL15 500, 2, 11.png differ
diff --git a/assignment-3/submission/19307130062/img/hkmxtjeiaj Elbow Method ALL15 500, 2, 5.png b/assignment-3/submission/19307130062/img/hkmxtjeiaj Elbow Method ALL15 500, 2, 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..c9ff4a6a358650703bf75f4ae2060a9614ea620e
Binary files /dev/null and b/assignment-3/submission/19307130062/img/hkmxtjeiaj Elbow Method ALL15 500, 2, 5.png differ
diff --git a/assignment-3/submission/19307130062/img/maddmqffwl Elbow Method ALL15 500, 2, 13.png b/assignment-3/submission/19307130062/img/maddmqffwl Elbow Method ALL15 500, 2, 13.png
new file mode 100644
index 0000000000000000000000000000000000000000..6013b0c9b5fd5f5e196f4f1185cba797c0302fd4
Binary files /dev/null and b/assignment-3/submission/19307130062/img/maddmqffwl Elbow Method ALL15 500, 2, 13.png differ
diff --git a/assignment-3/submission/19307130062/img/ndrdqqpmea Elbow Method ALL15 500, 2, 7.png b/assignment-3/submission/19307130062/img/ndrdqqpmea Elbow Method ALL15 500, 2, 7.png
new file mode 100644
index 0000000000000000000000000000000000000000..8cda99abd6d831bce4fff3af8394eb04d13df80f
Binary files /dev/null and b/assignment-3/submission/19307130062/img/ndrdqqpmea Elbow Method ALL15 500, 2, 7.png differ
diff --git a/assignment-3/submission/19307130062/img/onaixzshct 500, 2, 13.png b/assignment-3/submission/19307130062/img/onaixzshct 500, 2, 13.png
new file mode 100644
index 0000000000000000000000000000000000000000..dc6d5517c5e3011fd8ecab17756a6522ebb4a4b3
Binary files /dev/null and b/assignment-3/submission/19307130062/img/onaixzshct 500, 2, 13.png differ
diff --git a/assignment-3/submission/19307130062/img/pbtjfkfwjw 500, 2, 13.png b/assignment-3/submission/19307130062/img/pbtjfkfwjw 500, 2, 13.png
new file mode 100644
index 0000000000000000000000000000000000000000..dc61f97062a45b7458efa5426d9e2c3507c5fd8c
Binary files /dev/null and b/assignment-3/submission/19307130062/img/pbtjfkfwjw 500, 2, 13.png differ
diff --git a/assignment-3/submission/19307130062/img/uvqnndbvpu 500, 2, 9.png b/assignment-3/submission/19307130062/img/uvqnndbvpu 500, 2, 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..927d6bb4a872518e9e45bbaf8279535f841364c0
Binary files /dev/null and b/assignment-3/submission/19307130062/img/uvqnndbvpu 500, 2, 9.png differ
diff --git a/assignment-3/submission/19307130062/img/uxcngpxjhl 500, 2, 5.png b/assignment-3/submission/19307130062/img/uxcngpxjhl 500, 2, 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..a69e8952e596f1e116461a938f1d0e0c54397ad4
Binary files /dev/null and b/assignment-3/submission/19307130062/img/uxcngpxjhl 500, 2, 5.png differ
diff --git a/assignment-3/submission/19307130062/img/vhklkqwtkz 500, 2, 5.png b/assignment-3/submission/19307130062/img/vhklkqwtkz 500, 2, 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..bc886e28b1c714d7f7a84d32cb7f0a0eaf9de361
Binary files /dev/null and b/assignment-3/submission/19307130062/img/vhklkqwtkz 500, 2, 5.png differ
diff --git a/assignment-3/submission/19307130062/img/wishwtfbzz 500, 2, 9.png b/assignment-3/submission/19307130062/img/wishwtfbzz 500, 2, 9.png
new file mode 100644
index 0000000000000000000000000000000000000000..dc82075241af36966c6e0e73228fe28e0455a078
Binary files /dev/null and b/assignment-3/submission/19307130062/img/wishwtfbzz 500, 2, 9.png differ
diff --git a/assignment-3/submission/19307130062/img/wuxvkrvwya 500, 2, 13.png b/assignment-3/submission/19307130062/img/wuxvkrvwya 500, 2, 13.png
new file mode 100644
index 0000000000000000000000000000000000000000..8ba9029afec51a0b3be7804f4ec6dc2421c5521b
Binary files /dev/null and b/assignment-3/submission/19307130062/img/wuxvkrvwya 500, 2, 13.png differ
diff --git a/assignment-3/submission/19307130062/img/wyrclifhqj 500, 2, 5.png b/assignment-3/submission/19307130062/img/wyrclifhqj 500, 2, 5.png
new file mode 100644
index 0000000000000000000000000000000000000000..de70dc270b7fa94f8ba14326119526d9ccaccf83
Binary files /dev/null and b/assignment-3/submission/19307130062/img/wyrclifhqj 500, 2, 5.png differ
diff --git a/assignment-3/submission/19307130062/img/xhynyscjpi Elbow Method ALL15 500, 2, 3.png b/assignment-3/submission/19307130062/img/xhynyscjpi Elbow Method ALL15 500, 2, 3.png
new file mode 100644
index 0000000000000000000000000000000000000000..d45b29105396db5fc6ee6942a5c8245434327ec8
Binary files /dev/null and b/assignment-3/submission/19307130062/img/xhynyscjpi Elbow Method ALL15 500, 2, 3.png differ
diff --git a/assignment-3/submission/19307130062/source.py b/assignment-3/submission/19307130062/source.py
new file mode 100644
index 0000000000000000000000000000000000000000..245ced90da1ec2939f98772a5013cb57bf1c4c1e
--- /dev/null
+++ b/assignment-3/submission/19307130062/source.py
@@ -0,0 +1,456 @@
+import numpy as np
+
+def distance(x, y, type = "euclidean", lp = -1.): 
+    """ 
+        支持多样化距离的计算,包括欧几里得距离、曼哈顿距离、切比雪夫距离
+        以及更一般的 Lp 范数(要求 p 为正实数,且不超过 100),操作不合法时
+        返回 0.
+    """ 
+    
+    x = x.astype(np.float64)
+    y = y.astype(np.float64)
+    
+    if type == "euclidean": 
+        return np.sqrt(sum((x - y)**2))
+        
+    elif type == "manhattan": 
+        return sum(abs(x - y))
+        
+    elif type == "chebyshev": 
+        return max(abs(x - y))
+        
+    elif type == "Lp":
+        if 0 < lp <= 100.: 
+            return sum((abs(x - y)) ** lp) ** (1. / lp)
+        else: 
+            print("Error: Lp-norm is illegal.")
+            return 0. 
+            
+    else: 
+        print("Error: can't identify the type of distance.")
+        return 0.
+        
+        
+def preprocess(x, type = "min_max"): 
+    """
+        支持 Min-Max 或 Z-score 方法的数据归一化预处理
+    """
+    
+    x = x.astype(np.float64)
+
+    if type == "min_max": 
+        for dim in range(x.shape[1]): 
+            dim_min, dim_max = min(x[:, dim]), max(x[:, dim])
+            x[:, dim] = (x[:, dim] - dim_min) / (dim_max - dim_min)
+        return x
+        
+    elif type == "z_score": 
+        for dim in range(x.shape[1]): 
+            std, mu = np.std(x[:, dim]), np.mean(x[:, dim])
+            x[:, dim] = (x[:, dim] - mu) / std
+        return x
+        
+    elif type == "none":
+        return x
+
+class KMeans:
+
+    def __init__(self, n_clusters, preprocess_type = "min_max"): # 默认采用 Min-Max 归一化方法
+        self.K = n_clusters
+        self.preprocess_type = preprocess_type
+    
+    def fit(self, train_data, iter_times = 10, debug = False):
+    
+        train_data = preprocess(train_data, type = self.preprocess_type)
+        n = train_data.shape[0]                             # 训练数据数
+        cluster_size = np.zeros((self.K, 1)).astype(int)    # 每个聚类中的点数,类型转换为整型
+        train_label = np.zeros((n)).astype(int)             # 每个点所属的聚类,类型转换为整型
+    
+        # self.center  = np.random.rand(self.K, train_data.shape[1])
+        self.center = train_data[: self.K] # 取前 K 个样本作为初始聚类中心,应事先打乱数据
+        
+        for T in range(iter_times): 
+        
+            # print("Iteration {0}:\t".format(T + 1), end = '')
+            
+            sum_of_distance = 0.
+        
+            for i in range(n):
+            
+                min_dist = distance(train_data[i], self.center[self.K - 1])
+                min_dist_cluster = self.K - 1
+                
+                for j in range(self.K - 1): 
+                    dis = distance(train_data[i], self.center[j])
+                    if dis < min_dist: 
+                        min_dist = dis
+                        min_dist_cluster = j
+                        
+                train_label[i] = min_dist_cluster
+                sum_of_distance += min_dist 
+                # 求距离每个点最近的聚类中心
+                
+            self.center.fill(0)
+            cluster_size.fill(0)
+            
+            for i in range(n): 
+                self.center[train_label[i]] += train_data[i]
+                cluster_size[train_label[i]] += 1
+            
+            for j in range(self.K): 
+                if cluster_size[j] == 0: 
+                    self.center[j] = np.random.rand(train_data.shape[1]) 
+                    # 如果该聚类中无任何数据点,则随机一个新的聚类中心
+                else: 
+                    self.center[j] /= cluster_size[j] 
+                    # 更新新的聚类中心
+            
+                
+            # print("{0}".format(sum_of_distance))
+            
+        if debug: 
+            return self.center, sum_of_distance
+    
+    def predict(self, test_data):
+    
+        test_data = preprocess(test_data, type = self.preprocess_type)
+        
+        n = test_data.shape[0]                  # 测试数据数
+        test_label = np.zeros((n)).astype(int)  # 每个点所属的聚类,类型转换为整型
+        
+        for i in range(n):
+            
+            min_dist = distance(test_data[i], self.center[self.K - 1])
+            min_dist_cluster = self.K - 1
+            
+            for j in range(self.K - 1): 
+                dis = distance(test_data[i], self.center[j])
+                if dis < min_dist: 
+                    min_dist = dis
+                    min_dist_cluster = j
+                    
+            test_label[i] = min_dist_cluster
+            # 求距离每个点最近的聚类中心
+            
+        sum_of_distance = 0
+        for i in range(n): 
+            sum_of_distance += distance(test_data[i], self.center[test_label[i]])
+        # 计算各点到其聚类中心距离和
+          
+        return test_label
+
+class GaussianMixture:
+
+    def __init__(self, n_clusters, preprocess_type = "min_max"): # 默认采用 Min-Max 归一化方法
+        self.K = n_clusters
+        self.preprocess_type = preprocess_type
+    
+    def fit(self, train_data, iter_times = 20, debug = False):  # 支持多维高斯混合分布模型
+    
+        train_data = preprocess(train_data, self.preprocess_type) 
+    
+        dim = train_data.shape[1]                   # 训练数据维数
+        n   = train_data.shape[0]                   # 训练数据数
+        gamma = np.zeros((n, self.K))               # GMM 中的参数,表达每个点属于每个聚类的可能性大小
+        train_label = np.zeros((n)).astype(int)     # 每个点所属的聚类,类型转换为整型
+        
+        self.mu     = np.random.rand(self.K, dim)               # 高斯分布的均值数组
+        self.cov    = np.array([np.eye(dim)] * self.K) * 0.1    # 高斯分布的协方差矩阵数组
+        self.alpha  = np.array([1.0 / self.K] * self.K)         # 高斯分布的隐变量数组
+        self.epsilon = 1e-10                                    # 用于防止数值问题的常量
+        
+        for T in range(iter_times): 
+            gamma.fill(0)
+        
+            """
+                查阅许多网上资料及 scipy 有关源码后,发现要想快速地(用矩阵乘法的形式)计算多个向量在某个
+                多维高斯分布下的 gamma,需要将 cov 矩阵逆分解为两个互为转置的矩阵的乘积,分别和前后也互为
+                转置关系的 (x - mu) 和 (x - mu)^T 相乘,方能实现批量计算。但上述方法较为复杂,此处采用
+                每次计算一项 gamma 的暴力方法实现。
+                
+                为了尽可能避免数值问题,此处先求出 log(gamma),最后再进行指数计算将其复原。
+            """
+            
+            for k in range(self.K): 
+                tail = np.log(self.alpha[k]) - 0.5 * (dim * np.log(2 * np.pi) + np.log(np.linalg.det(self.cov[k])))
+                # 对含隐变量高斯分布取 log 后舍弃 exp 部分的余项,可以事先计算出来
+                cov_inv = np.linalg.inv(self.cov[k])
+                # 对协方差矩阵求逆
+                
+                for i in range(n): 
+                    dev = train_data[i] - self.mu[k] 
+                    gamma[i, k] = -0.5 * (np.matmul(np.matmul(dev, cov_inv), dev.T)) + tail
+                    # 计算 exp 部分的数值
+
+            gamma = np.exp(gamma)
+            # 通过 exp 还原 gamma
+            gamma /= (gamma.sum(axis = 1).reshape(gamma.shape[0], -1))
+            # 对同一数据点做其在每个聚类上的 gamma 的归一化
+                
+            self.mu  = np.matmul(gamma.T, train_data) / gamma.sum(axis = 0).reshape(self.K, -1)
+            # 更新高斯分布的均值数组,此处的计算利用了广播机制
+            for k in range(self.K): 
+                dev = gamma[:, k].reshape(n, -1) * (train_data - self.mu[k]) #XXX 
+                self.cov[k] = np.matmul(dev.T, train_data - self.mu[k]) / gamma[:, k].sum() #XXX
+                self.cov[k] += np.eye(dim) * self.epsilon # 在对角线加上 \lambda I ,使协方差矩阵保持非奇异性质
+            # 更新高斯分布的协方差矩阵数组
+            self.alpha = gamma.sum(axis = 0) / n
+            # 更新高斯分布的隐变量数组
+            
+        for i in range(n):
+            train_label[i] = np.nonzero((gamma[i, :] == gamma[i, :].max()))[0]
+            # 计算每个点最可能属于的聚类,认为该点即属于此聚类(其实也可以视作分布然后采样得到,但可能会不够稳定)
+            
+        sum_of_distance = 0
+        for i in range(n): 
+            sum_of_distance += distance(train_data[i], self.mu[train_label[i]])
+        # 计算各点到其聚类中心距离和
+        
+        if debug == True: 
+            return self.mu, sum_of_distance
+        
+    
+    def predict(self, test_data):
+    
+        test_data = preprocess(test_data, self.preprocess_type)
+        
+        n   = test_data.shape[0]                # 训练数据数
+        dim = test_data.shape[1]                # 训练数据维数
+        test_label = np.zeros((n)).astype(int)  # 每个点所属的聚类,类型转换为整型
+        gamma = np.zeros((n, self.K))           # GMM 中的参数,表达每个点属于每个聚类的可能性大小
+        
+        for k in range(self.K): 
+            tail = np.log(self.alpha[k]) - 0.5 * (dim * np.log(2 * np.pi) + np.log(np.linalg.det(self.cov[k])))
+            cov_inv = np.linalg.inv(self.cov[k])
+            for i in range(n): 
+                dev = test_data[i] - self.mu[k]
+                gamma[i, k] = -0.5 * (np.matmul(np.matmul(dev.T, cov_inv), dev)) + tail
+        # 类似 fit 过程中 gamma 的计算
+                
+        for i in range(n):
+            test_label[i] = np.nonzero((gamma[i, :] == gamma[i, :].max()))[0]
+        # 计算每个点最可能属于的聚类,认为该点即属于此聚类
+            
+        sum_of_distance = 0
+        for i in range(n): 
+            sum_of_distance += distance(test_data[i], self.mu[test_label[i]])
+        # 计算各点到其聚类中心距离和
+       
+        return test_label
+   
+#----------------- Utils ----------------- #
+import matplotlib.pyplot as plt   
+from matplotlib import ticker
+
+def line_plot_init(n_clusters):
+    fig, ax = plt.subplots()
+    ax.xaxis.set_major_formatter(ticker.ScalarFormatter())  
+    ax.axvline(n_clusters, ls = '-', linewidth = 1.0, c = 'blue')
+    return fig, ax
+
+def line_plot(ax, X, Y, c, label): 
+    ax.plot(X, Y, color = c, linewidth = 1.0, label = label)
+    ax.scatter(X, Y, color = c, s = 4.0, marker = 'o')
+    ax.legend()
+    
+def line_plot_finish(fig, ax, filename):
+    fig.savefig(filename, dpi = 1000) 
+    fig.show()
+    
+def gen_random_filename(l = 10): 
+    s = ''
+    for i in range(l): 
+        s += (chr(np.random.randint(26) + ord('a')))
+    s += ' '
+    return s
+#----------------- Utils ----------------- #
+
+class ClusteringAlgorithm:
+    def __init__(self, algorithm_tag = "ALL", max_clusters = 15, n = 0, dim = 0, n_clusters = 0):
+        """
+            algorithm_tag:      需要尝试的算法,可从 "KMeans", "GMM" 和 "ALL"(全部尝试)中选择
+            max_clusters:       最大聚类数量
+            n, dim, n_clusters: 原始数据的生成参数
+        """
+        
+        self.algorithm_tag = algorithm_tag
+        self.max_clusters = max_clusters
+        self.n, self.dim, self.n_clusters = n, dim, n_clusters
+    
+    def fit(self, train_data):
+        fig, ax = line_plot_init(self.n_clusters)
+        # 初始化 plt ,同时标记出生成原始数据所用的聚类数
+        
+        suffix = str(self.n) + ', ' + str(self.dim) + ', ' + str(self.n_clusters) + ".png" # 在文件名中记录有关参数
+        
+        if self.algorithm_tag == "KMeans" or self.algorithm_tag == "ALL": 
+        
+            sod = np.zeros((self.max_clusters)) # 初始化 sum_of_distance(各点到其聚类中心距离和) 数组
+            
+            for n_clusters in range(1, self.max_clusters): 
+                model = KMeans(n_clusters)
+                center, sod[n_clusters] = model.fit(train_data, iter_times = 20, debug = True)
+            # 
+                
+            line_plot(ax, np.arange(1, self.max_clusters), sod[1: self.max_clusters], "C3", "KMeans")
+            # 对 sum_of_distance 做可视化
+            
+            thres, min_sod, min_sod_K, step = 3, sod[1], 1, sod[1] * 0.1
+            # 分别定义为 观察步数阈值、最小 sum_of_distance、最小 sum_of_distance 对应的聚类数、波动程度
+            for n_clusters in range(2, self.max_clusters): 
+                if sod[n_clusters] < min_sod - step: 
+                    min_sod = sod[n_clusters]
+                    min_sod_K = n_clusters
+                    thres = 3
+                else:
+                    thres -= 1
+                    if thres == 0: 
+                        break
+            """
+                这是自行定义的一套在 elbow method 中寻找拐点的方法,实际测试效果良好,非常符合人工观察结果
+                - 首先,由于仅有一个聚类时的 sum_of_distance 变化一般很大,因此将波动程度 'step' 定义为 sod[1] 的 10% (经验参数)
+                - 随后,我们从前向后扫描 sum_of_distance 数组,要求每次 sum_of_distance 数组相对当前得到的 sum_of_distance 的最小值至少降低波动程度 'step' 的数量。如果符合要求,用这一项更新 sum_of_distance 最小值,以及备选的聚类数,并恢复观察步数阈值 'thres' 数量。否则认为失败,观察步数阈值 'thres' 减少 1
+                - 如果观察步数阈值 'thres' 到达 0,则立刻结束扫描,当前的备选的聚类数即作为算法选择的聚类数,用于之后重新训练和预测
+            """
+                
+            ax.axvline(min_sod_K, ls = '--', linewidth = 1.0, c = 'C3')
+            # 标记出算法认为的最优聚类数量
+            
+            # print(min_sod_K)
+            
+            self.model = KMeans(min_sod_K)
+            self.model.fit(train_data, iter_times = 50) 
+            # 优先选择 K-Means 算法,并提高迭代次数,重新训练
+                
+            
+        if self.algorithm_tag == "GMM" or self.algorithm_tag == "ALL": 
+        
+            sod = np.zeros((self.max_clusters)) # 初始化 sum_of_distance(各点到其聚类中心距离和) 数组
+            
+            for n_clusters in range(1, self.max_clusters): 
+                model = GaussianMixture(n_clusters)
+                center, sod[n_clusters] = model.fit(train_data, iter_times = 20, debug = True)
+                
+            line_plot(ax, np.arange(1, self.max_clusters), sod[1: self.max_clusters], "C1", "GMM")
+            # 对 sum_of_distance 做可视化
+            
+            thres, min_sod, min_sod_K, step = 3, sod[1], 1, sod[1] * 0.1
+            for n_clusters in range(2, self.max_clusters): 
+                if sod[n_clusters] < min_sod - step: 
+                    min_sod = sod[n_clusters]
+                    min_sod_K = n_clusters
+                    thres = 3
+                else:
+                    thres -= 1
+                    if thres == 0: 
+                        break
+            # 该方法作用和流程同上,请见上方详细介绍
+            
+            ax.axvline(min_sod_K, ls = '--', linewidth = 1.0, c = 'C1')
+            # 标记出算法认为的最优聚类数量
+            
+            # print(min_sod_K)
+            
+            if self.algorithm_tag == "GMM": 
+                self.model = GaussianMixture(min_sod_K)
+                self.model.fit(train_data, iter_times = 50)
+                # 优先选择 K-Means 算法。如果明确指定,才使用 GMM 算法。随后提高迭代次数,重新训练。
+            
+        line_plot_finish(fig, ax, gen_random_filename() + "Elbow Method " + self.algorithm_tag + str(self.max_clusters) + ' ' + suffix)
+    
+    def predict(self, test_data):
+        return self.model.predict(test_data)
+        
+        
+import matplotlib.pyplot as plt
+
+def data_plot_init():
+    return plt.subplots(1, 2)
+    
+def data_plot(fig, ax, pos, data, label, center): 
+
+    for i in range(data.shape[0]): 
+        ax[pos].scatter(data[i, 0], data[i, 1], color = "C" + str(label[i]), marker = 'o')
+        # 画出各点聚类结果
+    for i in range(center.shape[0]): 
+        ax[pos].scatter(center[i, 0], center[i, 1], color = "black", marker = 'x')
+        # 画出聚类中心
+        
+def data_plot_finish(fig, filename):
+    fig.savefig(filename, dpi = 1000)
+    # fig.show()
+    
+def test_model(n = 500, dim = 2, n_clusters = 5, type = "GMM"): 
+    """
+        n:              向量数
+        dim:            向量维数
+        n_clusters:     生成数据的高斯分布聚类数
+        type:           需要测试的算法,可从 "KMeans", "GMM" 和 "CA"(自动聚类)中选择
+    """
+
+    center = np.random.rand(n_clusters, dim) * 30.  # 随机生成 n_clusters 个聚类中心,乘一个系数扩大间距
+    train_data = np.zeros((n, dim))                 # 初始化训练数据数组
+    train_label = np.zeros((n)).astype(int)         # 初始化数据标号数组
+    
+    for i in range(n): 
+        train_label[i] = np.random.randint(n_clusters)
+        train_data[i] = np.random.normal(loc = center[train_label[i], :], scale = 1.)
+        # 随机选定第 i 个中心,以此为高斯分布均值,控制标准差为 1 ,采样生成数据
+        
+    idx = np.arange(n)
+    np.random.shuffle(idx)
+    train_label, train_data = train_label[idx], train_data[idx]
+    # 打乱数据顺序
+    
+    fig, ax = data_plot_init()
+    suffix = ' '
+    
+    if dim == 2:
+        suffix = str(n) + ', ' + str(dim) + ', ' + str(n_clusters) + ".png" 
+        data_plot(fig, ax, 0, train_data, train_label, center)
+        # 数据可视化,在文件名中记录有关参数 
+    
+    if type == "KMeans": # 测试 K-Means
+        model = KMeans(n_clusters)
+        _center, sod = model.fit(train_data, debug = True) # 设定 debug 为 True,会返回训练好的聚类中心、各点到其聚类中心距离和
+        _label  = model.predict(train_data)
+        
+        err = 0
+        for i in range(n): 
+            for j in range(i): 
+                if (train_label[i] == train_label[j] and _label[i] != _label[j]) or (train_label[i] != train_label[j] and _label[i] == _label[j]): 
+                    err += 1
+        print("Accuracy: {0}%".format(100 - err / (n * (n - 1) / 2) * 100))
+        # 计算每两对点在原始数据和预测结果中的状态(是否处于同一类)是否一致,以此计算准确率
+        
+        if dim == 2:
+            data_plot(fig, ax, 1, preprocess(train_data), _label, _center)
+            # 对二维数据的聚类结果做可视化
+    
+    elif type == "GMM": # 测试 Gaussian Mixture Model
+        model = GaussianMixture(n_clusters)
+        _center, sod = model.fit(train_data, debug = True) # 设定 debug 为 True,会返回训练好的聚类中心、各点到其聚类中心距离和
+        _label  = model.predict(train_data)
+        
+        err = 0
+        for i in range(n): 
+            for j in range(i): 
+                if (train_label[i] == train_label[j] and _label[i] != _label[j]) or (train_label[i] != train_label[j] and _label[i] == _label[j]): 
+                    err += 1
+        print("Accuracy: {0}%".format(100 - err / (n * (n - 1) / 2) * 100))
+        # 计算每两对点在原始数据和预测结果中的状态(是否处于同一类)是否一致,以此计算准确率
+        
+        if dim == 2:
+            data_plot(fig, ax, 1, preprocess(train_data), _label, _center)
+            # 对二维数据的聚类结果做可视化
+            
+    elif type == "CA": # 测试自动聚类算法
+        model = ClusteringAlgorithm(n = n, dim = dim, n_clusters = n_clusters)
+        model.fit(train_data)
+        model.predict(train_data)
+        
+    if dim == 2: 
+        data_plot_finish(fig, gen_random_filename() + suffix)
+
+# test_model() 
+# 开始测试!
diff --git a/assignment-3/submission/19307130062/tester_demo.py b/assignment-3/submission/19307130062/tester_demo.py
new file mode 100644
index 0000000000000000000000000000000000000000..2eb8ff3e26ad5546412a857fc0bb1e870c08b6de
--- /dev/null
+++ b/assignment-3/submission/19307130062/tester_demo.py
@@ -0,0 +1,112 @@
+import numpy as np
+import sys
+
+from source import KMeans, GaussianMixture
+
+
+def shuffle(*datas):
+    data = np.concatenate(datas)
+    label = np.concatenate([
+        np.ones((d.shape[0],), dtype=int)*i
+        for (i, d) in enumerate(datas)
+    ])
+    N = data.shape[0]
+    idx = np.arange(N)
+    np.random.shuffle(idx)
+    data = data[idx]
+    label = label[idx]
+    return data, label
+
+
+def data_1():
+    mean = (1, 2)
+    cov = np.array([[73, 0], [0, 22]])
+    x = np.random.multivariate_normal(mean, cov, (800,))
+
+    mean = (16, -5)
+    cov = np.array([[21.2, 0], [0, 32.1]])
+    y = np.random.multivariate_normal(mean, cov, (200,))
+
+    mean = (10, 22)
+    cov = np.array([[10, 5], [5, 10]])
+    z = np.random.multivariate_normal(mean, cov, (1000,))
+
+    data, _ = shuffle(x, y, z)
+    return (data, data), 3
+
+
+def data_2():
+    train_data = np.array([
+        [23, 12, 173, 2134],
+        [99, -12, -126, -31],
+        [55, -145, -123, -342],
+    ])
+    return (train_data, train_data), 2
+
+
+def data_3():
+    train_data = np.array([
+        [23],
+        [-2999],
+        [-2955],
+    ])
+    return (train_data, train_data), 2
+
+
+def test_with_n_clusters(data_fuction, algorithm_class):
+    (train_data, test_data), n_clusters = data_fuction()
+    model = algorithm_class(n_clusters)
+    model.fit(train_data)
+    res = model.predict(test_data)
+    assert len(
+        res.shape) == 1 and res.shape[0] == test_data.shape[0], "shape of result is wrong"
+    return res
+
+
+def testcase_1_1():
+    test_with_n_clusters(data_1, KMeans)
+    return True
+
+
+def testcase_1_2():
+    res = test_with_n_clusters(data_2, KMeans)
+    return res[0] != res[1] and res[1] == res[2]
+
+
+def testcase_2_1():
+    test_with_n_clusters(data_1, GaussianMixture)
+    return True
+
+
+def testcase_2_2():
+    res = test_with_n_clusters(data_3, GaussianMixture)
+    return res[0] != res[1] and res[1] == res[2]
+
+
+def test_all(err_report=False):
+    testcases = [
+        ["KMeans-1", testcase_1_1, 4],
+        ["KMeans-2", testcase_1_2, 4],
+        # ["KMeans-3", testcase_1_3, 4],
+        # ["KMeans-4", testcase_1_4, 4],
+        # ["KMeans-5", testcase_1_5, 4],
+        ["GMM-1", testcase_2_1, 4],
+        ["GMM-2", testcase_2_2, 4],
+        # ["GMM-3", testcase_2_3, 4],
+        # ["GMM-4", testcase_2_4, 4],
+        # ["GMM-5", testcase_2_5, 4],
+    ]
+    sum_score = sum([case[2] for case in testcases])
+    score = 0
+    for case in testcases: 
+        res = case[2] if case[1]() else 0 
+        score += res
+        print("+ {:14} {}/{}".format(case[0], res, case[2]))
+    print("{:16} {}/{}".format("FINAL SCORE", score, sum_score))
+
+
+if __name__ == "__main__":
+    if len(sys.argv) > 1 and sys.argv[1] == "--report":
+        test_all(True)
+    else:
+        test_all()