亚洲激情专区-91九色丨porny丨老师-久久久久久久女国产乱让韩-国产精品午夜小视频观看

溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點擊 登錄注冊 即表示同意《億速云用戶服務條款》

mahout canopy怎么使用

發布時間:2021-12-16 16:59:05 來源:億速云 閱讀:125 作者:iii 欄目:云計算

這篇文章主要介紹“mahout canopy怎么使用”,在日常操作中,相信很多人在mahout canopy怎么使用問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”mahout canopy怎么使用”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!

canopy原理是聚類算法的一種實現
canopy是一種簡單,快速但是不準確的聚類方法
cannopy是一種小而美的聚類方法,其算法流程如下
1設樣本集為S 確定兩個闕值t1和t2 其中t1>t2
2任取一個樣本點p屬于s作為一個canopy記為c,從s中移除p
3記錄s中所有點到p的距離dist
4若dist<t1則將其點歸為C
5若dist<t2則將其點歸為S
重復2-5直至S為空
T1和T2參數
當T1過大時,會使許多點屬于多個cannopy,可能造成各個點的中心點間距比較近,各族區間不明顯
當T2過大時,增加強標記數據點的數量,會減少族的個數,T2過小,會增加族的個數,同時,增加計算時間
mahout中對canopy clustering的實現是比較巧妙的,整個聚類過程用兩個map操作和一個reduce操作就完成了
canopy構建過程可以概括為 遍歷給定點集S,設置兩個闕值,t1和t2且t1>t2選擇一個點,用低成本算法計算它與其他
canopy中心的距離,如果距離小于t1    則將該點加入那個canopy如果小于T2  則該點不會成為某個canopy的中心,重復整個過程,直到s非空
距離的實現
org.apache.mahout.common.distance.DistanceMeasure接口
CosineDistanceMeasure
SquaredEuclideanDistanceMeasure計算歐式距離的平方
EuclideanDistanceMeasure計算歐式距離
ManhatanDistanceMeasure 馬氏距離,圖像處理用的比較多
TanimotoDistanceMeasure jaccard相似度帶權重的歐式距離和馬氏距離
canopy使用注意點
1首先是輕量距離亮度的選擇。是選擇一個模型中的屬性,還是其他外部屬性這對canopy的分布很重要
2 T1和T2取值影響到重疊度F,以及canopy的粒度
3.canopy有消除孤立點的作用,而kmeas卻無能為力,建立canopies后,可以刪除那些包含比較少的canopy,往往這些canopy包含孤立點
4,設置好canopy內點的數目,來決定聚類中心數目k,這樣效果比較好
[root@localhost bin]# hadoop fs -mkdir /20140824
[root@localhost data]# vi test-data.csv
1 -0.213  -0.956  -0.003  0.056  0.091  0.017  -0.024  1
1 3.147  2.129  -0.006  -0.056  -0.063  -0.002  0.109  0
1 -2.165  -2.718  -0.008  0.043  -0.103  -0.156  -0.024  1
1 -4.337  -2.686  -0.012  0.122  0.082  -0.021  -0.042  1
root@localhost data]# hadoop fs -put test-data.csv /20140824
[root@localhost mahout-distribution-0.7]# hadoop jar org.apache.mahout.clustering.syntheticcontrol.canopy.Job -i /20140824/test-data.csv -o /20140824   -t1 10   -t2 1
6/12/05 05:37:09 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/12/05 05:37:13 INFO input.FileInputFormat: Total input paths to process : 1
16/12/05 05:37:14 INFO mapreduce.JobSubmitter: number of splits:1
16/12/05 05:37:15 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480730026445_0005
16/12/05 05:37:17 INFO impl.YarnClientImpl: Submitted application application_1480730026445_0005
16/12/05 05:37:17 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1480730026445_0005/
16/12/05 05:37:17 INFO mapreduce.Job: Running job: job_1480730026445_0005
16/12/05 05:38:26 INFO mapreduce.Job: Job job_1480730026445_0005 running in uber mode : false
16/12/05 05:38:27 INFO mapreduce.Job:  map 0% reduce 0%
16/12/05 05:39:25 INFO mapreduce.Job:  map 100% reduce 0%
16/12/05 05:39:28 INFO mapreduce.Job: Job job_1480730026445_0005 completed successfully
16/12/05 05:39:30 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=105369
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=339
        HDFS: Number of bytes written=457
        HDFS: Number of read operations=5
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=51412
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=51412
        Total vcore-seconds taken by all map tasks=51412
        Total megabyte-seconds taken by all map tasks=52645888
    Map-Reduce Framework
        Map input records=4
        Map output records=4
        Input split bytes=108
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=140
        CPU time spent (ms)=1620
        Physical memory (bytes) snapshot=87416832
        Virtual memory (bytes) snapshot=841273344
        Total committed heap usage (bytes)=15597568
    File Input Format Counters
        Bytes Read=231
    File Output Format Counters
        Bytes Written=457
16/12/05 05:39:31 INFO canopy.CanopyDriver: Build Clusters Input: /20140824/data Out: /20140824 Measure: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure@79b0cd8f t1: 10.0 t2: 1.0
16/12/05 05:39:32 INFO client.RMProxy: Connecting to ResourceManager at hadoop02/127.0.0.1:8032
16/12/05 05:39:33 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/12/05 05:39:37 INFO input.FileInputFormat: Total input paths to process : 1
16/12/05 05:39:38 INFO mapreduce.JobSubmitter: number of splits:1
16/12/05 05:39:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1480730026445_0006
16/12/05 05:39:38 INFO impl.YarnClientImpl: Submitted application application_1480730026445_0006
16/12/05 05:39:39 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1480730026445_0006/
16/12/05 05:39:39 INFO mapreduce.Job: Running job: job_1480730026445_0006
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=105814
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=1970
        HDFS: Number of bytes written=527
        HDFS: Number of read operations=13
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=26957
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=26957
        Total vcore-seconds taken by all map tasks=26957
        Total megabyte-seconds taken by all map tasks=27603968
    Map-Reduce Framework
        Map input records=4
        Map output records=4
        Input split bytes=112
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=134
        CPU time spent (ms)=1880
        Physical memory (bytes) snapshot=96550912
        Virtual memory (bytes) snapshot=841433088
        Total committed heap usage (bytes)=15597568
    File Input Format Counters
        Bytes Read=457
    File Output Format Counters
        Bytes Written=527
C-0{n=2 c=[1.000, -3.794, -2.694, -0.011, 0.102, 0.036, -0.055, -0.038, 1.000] r=[1:0.543, 2:0.008, 3:0.001, 4:0.020, 5:0.046, 6:0.034, 7:0.004]}
    Weight : [props - optional]:  Point:
    1.0: [1.000, -4.337, -2.686, -0.012, 0.122, 0.082, -0.021, -0.042, 1.000]
C-1{n=2 c=[1.000, -2.220, -2.270, -0.008, 0.066, -0.008, -0.079, -0.029, 1.000] r=[1:1.031, 2:0.433, 3:0.002, 4:0.016, 5:0.002, 6:0.010, 7:0.005]}
    Weight : [props - optional]:  Point:
    1.0: [1.000, -2.165, -2.718, -0.008, 0.043, -0.103, -0.156, -0.024, 1.000]
C-2{n=1 c=[0:1.000, 1:3.147, 2:2.129, 3:-0.006, 4:-0.056, 5:-0.063, 6:-0.002, 7:0.109] r=[]}
    Weight : [props - optional]:  Point:
    1.0: [0:1.000, 1:3.147, 2:2.129, 3:-0.006, 4:-0.056, 5:-0.063, 6:-0.002, 7:0.109]
C-3{n=1 c=[1.000, -1.189, -1.837, -0.006, 0.050, -0.006, -0.070, -0.024, 1.000] r=[]}
    Weight : [props - optional]:  Point:
    1.0: [1.000, -0.213, -0.956, -0.003, 0.056, 0.091, 0.017, -0.024, 1.000]
16/12/05 05:43:59 INFO clustering.ClusterDumper: Wrote 4 clusters
16/12/05 05:55:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxr-xr-x   - root supergroup          0 2016-12-05 05:43 /20140824/clusteredPoints
drwxr-xr-x   - root supergroup          0 2016-12-05 05:42 /20140824/clusters-0-final
drwxr-xr-x   - root supergroup          0 2016-12-05 05:39 /20140824/data
-rw-r--r--   1 root supergroup        231 2016-12-05 05:21 /20140824/test-data.csv

到此,關于“mahout canopy怎么使用”的學習就結束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續學習更多相關知識,請繼續關注億速云網站,小編會繼續努力為大家帶來更多實用的文章!

向AI問一下細節

免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。

AI

谢通门县| 都匀市| 卢龙县| 墨竹工卡县| 洪洞县| 滨海县| 乌兰浩特市| 沈阳市| 韩城市| 呼玛县| 梧州市| 礼泉县| 元氏县| 洞头县| 湘阴县| 凤庆县| 米脂县| 靖宇县| 曲沃县| 平阳县| 哈尔滨市| 涟源市| 楚雄市| 曲松县| 岳普湖县| 泽库县| 佛坪县| 获嘉县| 胶南市| 柳林县| 横山县| 安庆市| 石嘴山市| 灵川县| 阜康市| 错那县| 兰州市| 墨竹工卡县| 涞水县| 紫金县| 天水市|