分群

統計軟體 R

簡介

安裝

操作方式

變數與運算

有序數列

向量

矩陣

多維陣列

複數

因子

串列

資料框

時間數列

流程控制

輸出入

呼叫

函數

2D 繪圖

3D 繪圖

互動介面

套件列表

其他語言呼叫

R 的應用

集合

邏輯推論

模糊邏輯

機率邏輯

檢定

搜尋

優化算法

線性代數

決策樹

人工智慧

分群分類

SVM 向量機

神經網路

遺傳演算法

資料採礦

訊號處理

影像處理

語音處理

自然語言

機器學習

機器人

生物統計

數位訊號處理

方程式求解

數值分析

微積分

微分方程

線性規劃

圖形理論

統計推論

字串處理

正規表示式

視窗程式

網頁程式

文件格式

貝氏網路

訊息

機率統計書

相關網站

參考文獻

最新修改

簡體版

English

k-means

require(graphics)

# a 2-dimensional example
x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
           matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
colnames(x) <- c("x", "y")
(cl <- kmeans(x, 2))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:2, pch = 8, cex = 2)

# sum of squares
ss <- function(x) sum(scale(x, scale = FALSE)^2)

## cluster centers "fitted" to each obs.:
fitted.x <- fitted(cl);  head(fitted.x)
resid.x <- x - fitted(cl)

## Equalities : ----------------------------------
cbind(cl[c("betweenss", "tot.withinss", "totss")], # the same two columns
         c(ss(fitted.x), ss(resid.x),    ss(x)))
stopifnot(all.equal(cl$ totss,        ss(x)),
      all.equal(cl$ tot.withinss, ss(resid.x)),
      ## these three are the same:
      all.equal(cl$ betweenss,    ss(fitted.x)),
      all.equal(cl$ betweenss, cl$totss - cl$tot.withinss),
      ## and hence also
      all.equal(ss(x), ss(fitted.x) + ss(resid.x))
      )

kmeans(x,1)$withinss # trivial one-cluster, (its W.SS == ss(x))

## random starts do help here with too many clusters
## (and are often recommended anyway!):
(cl <- kmeans(x, 5, nstart = 25))
plot(x, col = cl$cluster)
points(cl$centers, col = 1:5, pch = 8)

決策樹

fit <- rpart(Price ~ Mileage + Type + Country, cu.summary)
par(xpd = TRUE)
plot(fit, compress = TRUE)
text(fit, use.n = TRUE)

hclust

require(graphics)

hc <- hclust(dist(USArrests), "ave")
plot(hc)
plot(hc, hang = -1)

## Do the same with centroid clustering and squared Euclidean distance,
## cut the tree into ten clusters and reconstruct the upper part of the
## tree from the cluster centers.
hc <- hclust(dist(USArrests)^2, "cen")
memb <- cutree(hc, k = 10)
cent <- NULL
for(k in 1:10){
  cent <- rbind(cent, colMeans(USArrests[memb == k, , drop = FALSE]))
}
hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb))
opar <- par(mfrow = c(1, 2))
plot(hc,  labels = FALSE, hang = -1, main = "Original Tree")
plot(hc1, labels = FALSE, hang = -1, main = "Re-start from 10 clusters")
par(opar)

參考文獻

  1. http://www.statmethods.net/advstats/cluster.html
  2. Quick-R : accessing the power of R
  3. K-Means Clustering
  4. Pragmatic Programming Techniques : Machine Learning in R: Clustering, SUNDAY, APRIL 8, 2012
  5. R Tutorial
  6. RDataMining

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License