Poszukiwanie powiązań jest jednym z najbardziej przydatnych metod analizy nieuporządkowanych danych. Metoda wywodzi się z analiz marketingowych, których celem była analiza koszyków (baskets) pod kątem towarów kupowanych wspólnie w każdej transakcji, jednakże metoda może być z powodzeniem wykorzystywana wszędzie tam gdzie obiekty są opisane zbiorem dowolnych cech kategoryzacyjnych. Najlepszym przykładem jest koszyk zakupowy: jeżeli ktoś kupuje chleb, kupuje masło i ser, jeżeli kupuje piwo, kupuje chipsy, jeżeli kupuje chipsy, kupuje colę itp. Reguły nie są symetryczne, co oznacza że piwo=>chipsy nie równa się chipsy=>piwo, ponieważ (na przykład) piwo kupują dorośli i z piwem kupują chipsy. Ale 90% chipsów kupują dzieci razem z colą.
Oryginalne dane nie muszą być tak zorganizowane, mogą być zorganizowane w postaci linii, i mogą zawierać powatarzające się etykiety (np. piwo, piwo, piwo, chipsy). W czasie importu dane zawsze zostają przekształcone na zbiór, czyli duplikaty zostaną usunięte.
library(arules)
library(arulesViz)
Loading required package: grid
library(tidyverse)
[30m── [1mAttaching packages[22m ────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mggplot2[30m 3.0.0 [32m✔[30m [34mpurrr [30m 0.2.5
[32m✔[30m [34mtibble [30m 1.4.2 [32m✔[30m [34mdplyr [30m 0.7.6
[32m✔[30m [34mtidyr [30m 0.8.1 [32m✔[30m [34mstringr[30m 1.3.1
[32m✔[30m [34mreadr [30m 1.1.1 [32m✔[30m [34mforcats[30m 0.3.0[39m
[30m── [1mConflicts[22m ───────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mtidyr[30m::[32mexpand()[30m masks [34mMatrix[30m::expand()
[31m✖[30m [34mtidyr[30m::[32mextract()[30m masks [34mmagrittr[30m::extract()
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()
[31m✖[30m [34mdplyr[30m::[32mrecode()[30m masks [34marules[30m::recode()
[31m✖[30m [34mpurrr[30m::[32mset_names()[30m masks [34mmagrittr[30m::set_names()[39m
library(magrittr)
Do analizy reguł służy pakiet arules, powstały na bazie popularnego algorytmu apriori, który wykorzystywany jest powszechnie do tego typu analiz. Będziemy analizować zbiór Hurgarda zawierające animizowane dane klientów jednego z biur podróży, na temat usług wykupowanych w ośrodkach w Hurgardzie (Egipt). Dane zorganizowane są w postaci pliku CSV, gdzie wiersze zawierają każdą transakcję, a kolumny listę wykupionych usług, wg zasady: jest wystąpienie lub pole puste.
Transakcje wczytujemy poleceniem read.transactions()
Reguła to notacja reprezentująca każdy przedmiot współwystępujący z
dowolnym zbiorem innych przedmiotów. Reguła posiada prawą stronę (RHS),
która zawsze jest pojedynczą elementem, oraz lewą stronę (LHS), która
jest zbiorem elementów (może być również pustym). Reguły łączy się
znakiem =>
hurg <- read.transactions("data/hurg.csv",sep=",")
hurg
transactions in sparse format with
258 transactions (rows) and
30 items (columns)
summary(hurg)
transactions as itemMatrix in sparse format with
258 rows (elements/itemsets/transactions) and
30 columns (items) and a density of 0.1776486
most frequent items:
1w HURG WRZES AL HB (Other)
186 103 100 88 85 813
element (itemset/transaction) length distribution:
sizes
0 3 4 5 6 7 8 9 10
1 20 66 65 51 31 16 5 3
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 4.000 5.000 5.329 6.000 10.000
includes extended item information - examples:
labels
1 1w
2 2w
3 3w
itemLabels(hurg) # wyświetla etykiety
[1] "1w" "2w" "3w" "ABU" "AL" "AL+" "ANIM" "AQUA" "B" "CZERW" "DZIEC" "Free"
[13] "GRUDZ" "HB" "HURG" "KAIR" "KARNAK" "LIPIEC" "MAJ" "NURK" "PARACH" "PAZDZ" "PUST" "QUAD"
[25] "REJS" "RESTAUR" "RYBY" "SIERP" "TENIS" "WRZES"
inspect(hurg) #Wyświetla wszystkie wiersze w formie zbiorów
items
[1] {1w,AL,CZERW,KARNAK}
[2] {1w,AL+,CZERW,KARNAK}
[3] {1w,HB,HURG,SIERP}
[4] {1w,AQUA,DZIEC,HB,HURG,SIERP}
[5] {1w,AQUA,DZIEC,HB,PUST,SIERP}
[6] {1w,AL,CZERW,NURK,PUST,QUAD,RESTAUR,RYBY}
[7] {1w,AL+,CZERW,NURK,PARACH,QUAD}
[8] {1w,AL+,ANIM,DZIEC,LIPIEC,TENIS}
[9] {1w,ABU,AL,ANIM,DZIEC,PAZDZ,PUST,RESTAUR}
[10] {1w,ABU,AL+,ANIM,AQUA,DZIEC,RESTAUR,SIERP}
[11] {1w,HB,HURG,RESTAUR,WRZES}
[12] {1w,HB,HURG,RESTAUR,WRZES}
[13] {1w,AL,ANIM,DZIEC,NURK,PARACH,PUST,QUAD,RESTAUR,SIERP}
[14] {1w,AL,ANIM,DZIEC,PUST,RESTAUR,WRZES}
[15] {1w,AL,ANIM,AQUA,DZIEC,NURK,TENIS,WRZES}
[16] {1w,HB,HURG,PUST,WRZES}
[17] {1w,HB,HURG,RESTAUR,WRZES}
[18] {1w,HB,HURG,RESTAUR,WRZES}
[19] {1w,AL,CZERW,DZIEC,KARNAK,PUST}
[20] {1w,AL,CZERW,DZIEC,PUST,REJS}
[21] {1w,HB,HURG,RESTAUR,WRZES}
[22] {1w,AL,AQUA,DZIEC,HURG,KARNAK,LIPIEC,PUST,RESTAUR}
[23] {1w,AL,DZIEC,SIERP}
[24] {1w,AL,DZIEC,WRZES}
[25] {1w,HB,KARNAK,WRZES}
[26] {1w,HB,KARNAK,WRZES}
[27] {1w,HB,KARNAK,WRZES}
[28] {1w,AL+,NURK,PARACH,QUAD,RYBY,SIERP}
[29] {1w,AL+,SIERP,TENIS}
[30] {1w,AL+,SIERP,TENIS}
[31] {1w,ABU,HB,HURG,WRZES}
[32] {1w,AQUA,CZERW,DZIEC,HB,HURG,RESTAUR}
[33] {1w,AL,AQUA,NURK,RESTAUR,SIERP,TENIS}
[34] {1w,AL+,NURK,SIERP,TENIS}
[35] {1w,AL,WRZES}
[36] {1w,HB,HURG,RESTAUR,WRZES}
[37] {1w,AL+,WRZES}
[38] {1w,AL+,WRZES}
[39] {1w,HB,HURG,WRZES}
[40] {1w,HB,HURG,WRZES}
[41] {1w,AL+,WRZES}
[42] {1w,ABU,AL+,WRZES}
[43] {1w,ABU,AL,PUST,RESTAUR,WRZES}
[44] {1w,ABU,AL,NURK,QUAD,RYBY,WRZES}
[45] {1w,HB,HURG,SIERP}
[46] {1w,DZIEC,HB,HURG,LIPIEC,RESTAUR}
[47] {1w,AL,HURG,WRZES}
[48] {1w,AL,HURG,WRZES}
[49] {1w,B,HURG,WRZES}
[50] {1w,B,HURG,WRZES}
[51] {1w,AL,HURG,WRZES}
[52] {1w,AL,HURG,PUST,RESTAUR,WRZES}
[53] {1w,AL,KARNAK,WRZES}
[54] {1w,CZERW,DZIEC,HB,HURG,KARNAK,RESTAUR}
[55] {1w,AQUA,DZIEC,HB,LIPIEC}
[56] {1w,B,HURG,WRZES}
[57] {1w,AL,HURG,KARNAK,RESTAUR,WRZES}
[58] {1w,AL+,HURG,NURK,PUST,QUAD,WRZES}
[59] {1w,ABU,B,HURG,TENIS,WRZES}
[60] {1w,B,HURG,KARNAK,WRZES}
[61] {1w,AL+,TENIS,WRZES}
[62] {2w,ABU,AL+,CZERW,NURK,PARACH,QUAD,RYBY,TENIS}
[63] {2w,AL,CZERW,KAIR,RESTAUR}
[64] {2w,AL,ANIM,DZIEC,LIPIEC,PUST,RESTAUR}
[65] {1w,B,HURG,WRZES}
[66] {2w,AL,ANIM,DZIEC,KAIR,SIERP}
[67] {2w,AL,ANIM,DZIEC,KAIR,PARACH,QUAD,RESTAUR,SIERP}
[68] {2w,AL,ANIM,DZIEC,RESTAUR,SIERP,TENIS}
[69] {1w,HB,HURG,KARNAK,WRZES}
[70] {2w,ABU,AL,ANIM,AQUA,DZIEC,KAIR,KARNAK,QUAD,SIERP}
[71] {1w,HB,HURG,KARNAK,WRZES}
[72] {1w,AL,WRZES}
[73] {1w,AL,WRZES}
[74] {1w,AQUA,HB,HURG,WRZES}
[75] {1w,AL,WRZES}
[76] {1w,AL,DZIEC,LIPIEC}
[77] {1w,ABU,AL+,DZIEC,HURG,LIPIEC,NURK,QUAD,RESTAUR}
[78] {1w,HB,HURG,KARNAK,PUST,RESTAUR,WRZES}
[79] {1w,HB,HURG,NURK,PARACH,WRZES}
[80] {1w,ABU,AL,DZIEC,HURG,KARNAK,RESTAUR,WRZES}
[81] {1w,AL,AQUA,DZIEC,PUST,WRZES}
[82] {1w,AL,AQUA,DZIEC,NURK,QUAD,RYBY,WRZES}
[83] {1w,AL+,LIPIEC}
[84] {1w,ABU,AL+,LIPIEC,PUST,QUAD}
[85] {1w,AL,HURG,LIPIEC,RESTAUR}
[86] {1w,AL+,KARNAK,LIPIEC}
[87] {1w,AL+,LIPIEC,PUST}
[88] {1w,AL+,LIPIEC,TENIS}
[89] {1w,AL,MAJ,PUST}
[90] {1w,AL,AQUA,PAZDZ,PUST}
[91] {1w,AL,PAZDZ,PUST,RESTAUR}
[92] {2w,ABU,HB,KAIR,KARNAK,LIPIEC,PUST}
[93] {1w,AL,PAZDZ,PUST,TENIS}
[94] {1w,AL,PAZDZ,RESTAUR}
[95] {1w,AL+,SIERP}
[96] {2w,ABU,HB,KAIR,KARNAK,RESTAUR,WRZES}
[97] {1w,AL+,SIERP}
[98] {1w,ABU,AL+,QUAD,SIERP}
[99] {2w,ABU,HB,KAIR,PUST,TENIS,WRZES}
[100] {2w,HB,KAIR,KARNAK,WRZES}
[101] {1w,ABU,AL+,HURG,KARNAK,PUST,SIERP}
[102] {1w,AL,HURG,KAIR,SIERP}
[103] {1w,AL+,PUST,SIERP,TENIS}
[104] {1w,HB,HURG,PUST,WRZES}
[105] {1w,AL,HURG,PAZDZ}
[106] {1w,AL,HURG,KARNAK,PAZDZ,PUST,RESTAUR}
[107] {1w,AL,HURG,KARNAK,PAZDZ,RESTAUR}
[108] {1w,HB,HURG,PUST,WRZES}
[109] {1w,HB,HURG,TENIS,WRZES}
[110] {1w,AL,RESTAUR,WRZES}
[111] {1w,AL,RESTAUR,WRZES}
[112] {1w,Free,HURG,PUST,SIERP}
[113] {1w,AL,TENIS,WRZES}
[114] {2w,AL,ANIM,AQUA,DZIEC,KARNAK,WRZES}
[115] {2w,AL,DZIEC,KAIR,LIPIEC,RYBY}
[116] {2w,AL,DZIEC,KAIR,SIERP}
[117] {2w,AL,DZIEC,KAIR,KARNAK,RYBY,SIERP}
[118] {1w,HB,LIPIEC}
[119] {1w,AL+,LIPIEC}
[120] {1w,ABU,AL+,LIPIEC,NURK,RYBY}
[121] {1w,HB,HURG,LIPIEC,PUST}
[122] {1w,HB,HURG,LIPIEC,PUST}
[123] {1w,HB,LIPIEC,TENIS}
[124] {1w,HB,HURG,SIERP}
[125] {2w,AL,DZIEC,RYBY,SIERP}
[126] {2w,AL,LIPIEC}
[127] {2w,ABU,AL,HURG,KAIR,KARNAK,LIPIEC,PUST,RESTAUR,TENIS}
[128] {2w,AL+,KAIR,LIPIEC,PUST,REJS}
[129] {2w,AL,KAIR,LIPIEC,RESTAUR,TENIS}
[130] {2w,ANIM,AQUA,CZERW,DZIEC,HB,KAIR,RESTAUR}
[131] {2w,ABU,ANIM,DZIEC,HB,KAIR,PUST,WRZES}
[132] {2w,AL+,KARNAK,LIPIEC,RYBY}
[133] {2w,AL+,LIPIEC,REJS,TENIS}
[134] {1w,HB,SIERP}
[135] {1w,AQUA,HB,SIERP}
[136] {1w,HB,HURG,SIERP}
[137] {2w,ABU,AL,KAIR,KARNAK,PAZDZ}
[138] {1w,AL,ANIM,AQUA,DZIEC,LIPIEC}
[139] {1w,AL,ANIM,DZIEC,HURG,LIPIEC,RESTAUR}
[140] {1w,B,WRZES}
[141] {1w,B,WRZES}
[142] {2w,ABU,AL,HURG,KAIR,KARNAK,RESTAUR,WRZES}
[143] {2w,AL,KAIR,PUST,REJS,WRZES}
[144] {2w,AL,KAIR,PUST,RESTAUR,WRZES}
[145] {1w,HB,HURG,WRZES}
[146] {1w,HB,HURG,WRZES}
[147] {2w,AL,KAIR,REJS,WRZES}
[148] {2w,AL+,REJS,WRZES}
[149] {2w,AL,RESTAUR,WRZES}
[150] {3w,AL,CZERW,HURG,KARNAK,TENIS}
[151] {1w,HB,NURK,RESTAUR,WRZES}
[152] {1w,HB,RESTAUR,WRZES}
[153] {2w,CZERW,DZIEC,HB,KAIR}
[154] {2w,ABU,CZERW,DZIEC,HB,KAIR}
[155] {2w,DZIEC,HB,KARNAK,PUST,RYBY,SIERP}
[156] {3w,ABU,AL+,KAIR,KARNAK,LIPIEC}
[157] {3w,ABU,AL,KAIR,KARNAK,PAZDZ,RESTAUR}
[158] {3w,ABU,AL+,KAIR,KARNAK,PUST,SIERP}
[159] {3w,AL+,HURG,KAIR,KARNAK,PUST,SIERP}
[160] {1w,B,CZERW,HURG}
[161] {1w,B,CZERW,HURG,NURK}
[162] {1w,B,CZERW,HURG,NURK}
[163] {2w,AL,KAIR,PUST,RESTAUR,WRZES}
[164] {2w,AL,KAIR,REJS,WRZES}
[165] {2w,AL+,REJS,WRZES}
[166] {2w,AL,RESTAUR,WRZES}
[167] {3w,AL,CZERW,HURG,KARNAK,TENIS}
[168] {1w,B,CZERW,NURK}
[169] {1w,HB,NURK,RESTAUR,WRZES}
[170] {1w,HB,RESTAUR,WRZES}
[171] {2w,ANIM,AQUA,CZERW,DZIEC,HB,KAIR,RESTAUR}
[172] {2w,ABU,ANIM,DZIEC,HB,KAIR,PUST,WRZES}
[173] {2w,CZERW,DZIEC,HB,KAIR}
[174] {2w,ABU,CZERW,DZIEC,HB,KAIR}
[175] {2w,DZIEC,HB,KARNAK,PUST,RYBY,SIERP}
[176] {1w,B,DZIEC,HURG,SIERP}
[177] {1w,B,DZIEC,HURG,KARNAK,SIERP}
[178] {1w,B,HURG,LIPIEC}
[179] {2w,AL,KAIR,NURK,PARACH,PAZDZ,QUAD,RESTAUR}
[180] {2w,AL,KARNAK,PAZDZ,PUST}
[181] {1w,HB,SIERP}
[182] {2w,AL,KAIR,SIERP}
[183] {2w,AL,HURG,KAIR,KARNAK,SIERP}
[184] {1w,B,CZERW,NURK,PARACH,PUST,RESTAUR,RYBY}
[185] {2w,AL,KAIR,PUST,RESTAUR,SIERP}
[186] {2w,AL,KAIR,REJS,SIERP}
[187] {2w,AL,KAIR,REJS,RESTAUR,SIERP}
[188] {1w,B,HURG,LIPIEC}
[189] {2w,ABU,AL,KAIR,KARNAK,PAZDZ,RESTAUR}
[190] {2w,AL,KAIR,KARNAK,PAZDZ,PUST}
[191] {1w,AL,CZERW,TENIS}
[192] {1w,ABU,B,HURG,LIPIEC,QUAD,RESTAUR}
[193] {1w,B,HURG,LIPIEC,PUST}
[194] {2w,AL,KAIR,REJS,SIERP}
[195] {1w,B,KARNAK,LIPIEC,PUST}
[196] {2w,AL,HURG,KAIR,SIERP,TENIS}
[197] {2w,AL,KAIR,PUST,SIERP}
[198] {1w,B,PAZDZ}
[199] {1w,B,HURG,KARNAK,PAZDZ}
[200] {1w,B,HURG,SIERP}
[201] {1w,AQUA,HB,HURG,KARNAK,RESTAUR,WRZES}
[202] {1w,AL,WRZES}
[203] {1w,B,HURG,SIERP}
[204] {1w,B,HURG,RESTAUR,SIERP}
[205] {1w,ABU,B,WRZES}
[206] {1w,B,HURG,WRZES}
[207] {1w,HB,HURG,WRZES}
[208] {1w,HB,HURG,WRZES}
[209] {2w,AL,HURG,KAIR,SIERP,TENIS}
[210] {2w,AL,KAIR,PUST,SIERP}
[211] {1w,HB,HURG,WRZES}
[212] {1w,HB,HURG,WRZES}
[213] {1w,B,HURG,WRZES}
[214] {1w,Free,GRUDZ,HURG,PUST}
[215] {1w,B,NURK,WRZES}
[216] {1w,B,NURK,RYBY,WRZES}
[217] {2w,B,CZERW,HURG,KAIR,KARNAK}
[218] {2w,B,KARNAK,LIPIEC,NURK,RYBY}
[219] {1w,CZERW,Free,KARNAK,PUST,TENIS}
[220] {1w,Free,GRUDZ}
[221] {1w,CZERW,DZIEC,HB,HURG}
[222] {1w,AQUA,CZERW,DZIEC,HB,HURG}
[223] {1w,B,HURG,NURK,RYBY,WRZES}
[224] {1w,B,HURG,RESTAUR,WRZES}
[225] {1w,DZIEC,HB,HURG,LIPIEC}
[226] {1w,AQUA,DZIEC,HB,HURG,LIPIEC}
[227] {1w,B,KARNAK,NURK,WRZES}
[228] {1w,AQUA,DZIEC,HB,HURG,LIPIEC}
[229] {1w,DZIEC,HB,HURG,LIPIEC,QUAD}
[230] {2w,ABU,Free,GRUDZ,KAIR,KARNAK,REJS}
[231] {3w,ABU,Free,GRUDZ,HURG,KAIR,KARNAK,PUST,RESTAUR}
[232] {1w,AQUA,CZERW,HB}
[233] {1w,ABU,ANIM,AQUA,CZERW,DZIEC,HB}
[234] {1w,ANIM,CZERW,DZIEC,HB,HURG}
[235] {1w,ANIM,CZERW,DZIEC,HB,HURG,RESTAUR,TENIS}
[236] {1w,CZERW,DZIEC,Free}
[237] {1w,CZERW,DZIEC,HB}
[238] {1w,CZERW,DZIEC,HB,HURG}
[239] {1w,AL+,LIPIEC,NURK,PUST,QUAD,TENIS}
[240] {1w,CZERW,DZIEC,HB,HURG}
[241] {1w,DZIEC,HB,WRZES}
[242] {1w,AQUA,DZIEC,HB,WRZES}
[243] {1w,AL,ANIM,DZIEC,LIPIEC,PUST}
[244] {2w,AL,KARNAK,RESTAUR,SIERP}
[245] {2w,AL+,RYBY,SIERP}
[246] {2w,ABU,AL,KAIR,RYBY,WRZES}
[247] {1w,AQUA,DZIEC,HB,RESTAUR,WRZES}
[248] {1w,DZIEC,HB,HURG,WRZES}
[249] {3w,ABU,Free,GRUDZ,KAIR,KARNAK,PUST}
[250] {1w,AQUA,DZIEC,HB,HURG,RESTAUR,WRZES}
[251] {1w,DZIEC,HB,HURG,RESTAUR,WRZES}
[252] {1w,ABU,Free,GRUDZ,HURG,KARNAK}
[253] {1w,ABU,HB,HURG,SIERP}
[254] {1w,AL+,LIPIEC,NURK,PARACH,PUST,QUAD,RYBY}
[255] {1w,AL+,LIPIEC,NURK,PARACH,QUAD,RYBY,TENIS}
[256] {1w,HB,HURG,KARNAK,SIERP}
[257] {1w,HB,HURG,TENIS,WRZES}
[258] {}
Do wstępnej oceny, czy występują często powtarzające się etykiety, służy wykres itemFrequencyPlot, będący formą wykresu osypiska oraz metoda dendrogramowa, która niepodobieństwo liczy funkcją dissimilarity(). Podobieństwo pomiędzy transakcjami najbardziej oddaje miara powinowatości (“affinity”), która wskazuje przeciętną powinowatość (występowanie wspólnych elementów) w transakcjach.
hurg %>% itemFrequencyPlot(topN=20,cex.names=0.8)
dissimilarity(hurg,method = "affinity") %>% hclust(method="ward.D2") %>% plot
Wyszukiwanie reguł wykonuje się funkcją apriori()
$ Support = P(A B) $
$ Confidence = = $
$ Expected Confidence = P(B) $
Lift: confidence/ Expected Confidence:
Lift =
support określa częstość występowania wybranych transakcji; confidence określa czy współwystępowanie A i B nie jest a Lift to parametr, który określa czy czy współwystępowanie A i B przewyższa prawdopodobieństwo współwystępowania A i B, gdyby te były od siebie niezależne. Im większy lift, tym większa szansa że A i B będą występować wspólnie.
itemsets <- hurg %>% apriori(parameter=list(target="rules",support=0.01,confidence=0.5))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
0.5 0.1 1 none FALSE TRUE 5 0.01 1 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 2
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[30 item(s), 258 transaction(s)] done [0.00s].
sorting and recoding items ... [29 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4 5 6 done [0.00s].
writing ... [2246 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
itemsets %>% plot()
itemsets %>% subset(subset = support<0.1 & confidence>0.9 & lift>20) %>% inspect
lhs rhs support confidence lift count
[1] {GRUDZ} => {Free} 0.02325581 1 28.66667 6
[2] {ABU,GRUDZ} => {Free} 0.01550388 1 28.66667 4
[3] {ABU,Free} => {GRUDZ} 0.01550388 1 43.00000 4
[4] {GRUDZ,KAIR} => {Free} 0.01162791 1 28.66667 3
[5] {Free,KAIR} => {GRUDZ} 0.01162791 1 43.00000 3
[6] {GRUDZ,KARNAK} => {Free} 0.01550388 1 28.66667 4
[7] {GRUDZ,PUST} => {Free} 0.01162791 1 28.66667 3
[8] {GRUDZ,HURG} => {Free} 0.01162791 1 28.66667 3
[9] {1w,GRUDZ} => {Free} 0.01162791 1 28.66667 3
[10] {ABU,GRUDZ,KAIR} => {Free} 0.01162791 1 28.66667 3
[11] {ABU,Free,KAIR} => {GRUDZ} 0.01162791 1 43.00000 3
[12] {ABU,GRUDZ,KARNAK} => {Free} 0.01550388 1 28.66667 4
[13] {ABU,Free,KARNAK} => {GRUDZ} 0.01550388 1 43.00000 4
[14] {GRUDZ,KAIR,KARNAK} => {Free} 0.01162791 1 28.66667 3
[15] {Free,KAIR,KARNAK} => {GRUDZ} 0.01162791 1 43.00000 3
[16] {AL+,KAIR,KARNAK} => {3w} 0.01162791 1 32.25000 3
[17] {AL+,QUAD,RYBY} => {PARACH} 0.01550388 1 25.80000 4
[18] {ABU,GRUDZ,KAIR,KARNAK} => {Free} 0.01162791 1 28.66667 3
[19] {ABU,Free,KAIR,KARNAK} => {GRUDZ} 0.01162791 1 43.00000 3
[20] {AL+,NURK,QUAD,RYBY} => {PARACH} 0.01550388 1 25.80000 4
[21] {1w,AL+,QUAD,RYBY} => {PARACH} 0.01162791 1 25.80000 3
[22] {1w,AL+,NURK,QUAD,RYBY} => {PARACH} 0.01162791 1 25.80000 3
itemsets %>% subset(subset = lhs %in% 'SIERP' & support >0.05) %>% inspect
lhs rhs support confidence lift count
[1] {SIERP} => {1w} 0.12015504 0.5740741 0.7962963 31
[2] {KAIR,SIERP} => {2w} 0.05813953 0.8333333 3.4126984 15
[3] {2w,SIERP} => {KAIR} 0.05813953 0.7142857 3.5439560 15
[4] {KAIR,SIERP} => {AL} 0.06201550 0.8888889 2.6060606 16
[5] {AL,SIERP} => {KAIR} 0.06201550 0.7272727 3.6083916 16
[6] {2w,SIERP} => {AL} 0.06976744 0.8571429 2.5129870 18
[7] {AL,SIERP} => {2w} 0.06976744 0.8181818 3.3506494 18
[8] {HURG,SIERP} => {1w} 0.05813953 0.7894737 1.0950764 15
[9] {2w,KAIR,SIERP} => {AL} 0.05813953 1.0000000 2.9318182 15
[10] {AL,KAIR,SIERP} => {2w} 0.05813953 0.9375000 3.8392857 15
[11] {2w,AL,SIERP} => {KAIR} 0.05813953 0.8333333 4.1346154 15
itemsets %>% subset(subset = support<0.1 & confidence>0.95 & lift>20) %>% plot(method="graph")
W poniższej analizie nie wykorzystamy pakietu sf, ze względu na jego zbyt dużą “nowość” i brak dobrej dokumentacji
library(optpart)
library(rgdal)
Do analizy zostaną użyte sklepy znajdujące się w centrum Poznania (pobrane z Open Street Map). Do zbudowania koszyków zostanie wykorzystane grupy sklepów znajdujących się we wzajemnej odległości od siebie do 75 metrów. Podstawą zbudowania zależności będą macierz odległości taktowana jako graf (było!) a w ramach grafu zostaną wyróżnione kliki czyli zbiory obiektów wzajemnie ze sobą połączonych. Proces analizy rozpocznie się od wczytania danych funkcją readOGR() z pakietu rgdal obliczenie odległości pomiędzy nimi i przeskalowanie do wartości od 0 do 1, a następnie wyznaczenie klik, które następnie zostaną zamienione na obiekt transakcji, zrozumiały dla funkcji apriori()
sklepy=readOGR(dsn = "data",layer = 'sklepy')
OGR data source with driver: ESRI Shapefile
Source: "/home/jarekj/Dropbox/dydaktyka/data_mining/kurs/dzien_3_2/data", layer: "sklepy"
with 289 features
It has 2 fields
d <- dist(sklepy@coords)
d <- d/1000 #d musi być pomiędzy 0 a 1, skalujemy do kilometra
plot(sklepy@coords)
kliki <- d %>% clique(0.075,1) #75 metrów
tmpSets <- kliki$member %>% lapply(function(x) {sklepy$shop[x]}) #konwersja złożonego obiektu na listę
names(tmpSets) <- paste("TR",c(1:length(tmpSets)),sep="")
spatialSets <- tmpSets %>% as("transactions") #konwersja na transakcje/zbiory z usunięciem duplikatów
removing duplicated items in transactions
Przegląd zbioru danych:
spatialSets %>% inspect()
items transactionID
[1] {supermarket} TR1
[2] {bakery,kiosk,ticket} TR2
[3] {bicycle} TR3
[4] {chemist,kiosk} TR4
[5] {kiosk} TR5
[6] {convenience} TR6
[7] {bicycle} TR7
[8] {books,chemist,convenience} TR8
[9] {deli} TR9
[10] {craft} TR10
[11] {books} TR11
[12] {supermarket} TR12
[13] {chemist,convenience,jewelry} TR13
[14] {chemist,clothes,fashion,houseware,shoes,supermarket} TR14
[15] {photo,shoes} TR15
[16] {clothes,shoes} TR16
[17] {bicycle,books,clothes,fashion,shoes} TR17
[18] {bicycle,books,clothes,fashion,jewelry,shoes} TR18
[19] {books,clothes,fashion,jewelry,shoes} TR19
[20] {books,clothes,fashion,jewelry,shoes} TR20
[21] {clothes,fashion,jewelry,shoes,supermarket} TR21
[22] {clothes,fashion,jewelry,mobile_phone,shoes,supermarket} TR22
[23] {clothes,fashion,hairdresser,mobile_phone,shoes} TR23
[24] {clothes,fashion,hairdresser,mobile_phone,shoes,supermarket} TR24
[25] {clothes,fashion,hairdresser,mobile_phone,shoes} TR25
[26] {books,fashion,hairdresser,mobile_phone,shoes,yes} TR26
[27] {books,clothes,fashion,hairdresser,mobile_phone,shoes,yes} TR27
[28] {books,clothes,fashion,hairdresser,shoes,yes} TR28
[29] {books,clothes,convenience,fashion,shoes,yes} TR29
[30] {clothes,convenience,fashion,shoes} TR30
[31] {clothes,convenience,fashion} TR31
[32] {clothes,convenience,fashion} TR32
[33] {convenience} TR33
[34] {convenience} TR34
[35] {convenience} TR35
[36] {convenience} TR36
[37] {clothes,computer,convenience,fashion,jewelry,medical_supply,shoes} TR37
[38] {chemist,clothes,computer,convenience,fashion,jewelry,medical_supply,shoes} TR38
[39] {bakery,convenience} TR39
[40] {convenience} TR40
[41] {convenience} TR41
[42] {convenience} TR42
[43] {convenience,yes} TR43
[44] {convenience,yes} TR44
[45] {bicycle,electronics} TR45
[46] {mobile_phone} TR46
[47] {convenience} TR47
[48] {convenience,hairdresser,hardware} TR48
[49] {convenience,electronics,hairdresser,hardware,radiotechnics} TR49
[50] {clothes,confectionery,fashion} TR50
[51] {chemist,clothes,computer,fashion,jewelry,medical_supply,shoes,supermarket} TR51
[52] {clothes,computer,convenience,fashion,jewelry,medical_supply,shoes} TR52
[53] {books} TR53
[54] {bicycle} TR54
[55] {bakery,books,butcher,convenience} TR55
[56] {clothes,erotic} TR56
[57] {clothes,confectionery,erotic} TR57
[58] {clothes,confectionery,erotic} TR58
[59] {beverages} TR59
[60] {books,chemist,outdoor} TR60
[61] {books} TR61
[62] {bakery,kitchenware} TR62
[63] {bakery,books,clothes,fabric,optician,outdoor,toys} TR63
[64] {bakery,clothes,fabric,jewelry,optician,outdoor,toys} TR64
[65] {bakery,clothes,fabric,gift,jewelry,organic,outdoor,toys} TR65
[66] {gift,hairdresser,outdoor,toys} TR66
[67] {chemist,convenience,gift,supermarket,travel_agency} TR67
[68] {chemist,cobbler,convenience,gift,supermarket,travel_agency} TR68
[69] {gift,kiosk,optician,supermarket} TR69
[70] {chemist,convenience,gift,kiosk,supermarket,travel_agency} TR70
[71] {confectionery,hairdresser,kiosk} TR71
[72] {confectionery,hairdresser,mobile_telephony} TR72
[73] {butcher} TR73
[74] {delicatessen,dry_cleaning,optician} TR74
[75] {delicatessen,laundry,optician} TR75
[76] {laundry,optician,photo,travel_agency} TR76
[77] {books,photo,sweetshop,travel_agency} TR77
[78] {books,dishware,sweetshop,travel_agency} TR78
[79] {clothes,convenience} TR79
[80] {books,dishware,hairdresser,jewelry,sweetshop,travel_agency} TR80
[81] {antiques,convenience,deli} TR81
[82] {convenience,deli,shoes} TR82
[83] {convenience,optician,shoes} TR83
[84] {chemist,cobbler,convenience,newsagent,optician,shoes,supermarket,travel_agency} TR84
[85] {bakery,hairdresser,jewelry,travel_agency} TR85
[86] {clothes,convenience,mobile_telephony} TR86
[87] {bicycle,butcher,dry_cleaning,hairdresser,kiosk,kitchenware,mobile_telephony} TR87
[88] {bakery,bicycle,butcher,clothes,dry_cleaning,fabric,kiosk,kitchenware,mobile_telephony} TR88
[89] {books} TR89
[90] {bakery,books,perfumes} TR90
[91] {books,perfumes,travel_agency} TR91
[92] {books,mobile_phone,travel_agency} TR92
[93] {antiques,chemist,mobile_phone} TR93
[94] {books,chemist} TR94
[95] {books,chemist,mobile_phone,travel_agency} TR95
[96] {florist,kiosk,optician,supermarket} TR96
[97] {books,florist,mobile_phone,travel_agency} TR97
[98] {gift,kiosk,optician,travel_agency} TR98
[99] {florist,kiosk,optician,travel_agency} TR99
[100] {books,clothes,kiosk,optician,travel_agency} TR100
[101] {books,butcher,convenience,newsagent} TR101
[102] {fashion,jewelry,optician} TR102
[103] {jewelry,shoes} TR103
[104] {florist} TR104
[105] {convenience,fabric,kitchenware} TR105
[106] {fabric,jewelry,shoes} TR106
[107] {books,chemist} TR107
[108] {books,fashion,optician} TR108
[109] {fashion,houseware,jewelry,optician} TR109
[110] {books,fashion,jewelry,optician} TR110
[111] {books,optician} TR111
[112] {chemist,clothes,fashion,houseware,shoes,supermarket} TR112
[113] {millinery,music_store,shoes,travel_agency} TR113
[114] {alcohol,millinery,music_store,travel_agency} TR114
[115] {clothes,fashion,houseware,shoes,supermarket} TR115
[116] {clothes,fashion,houseware,shoes,supermarket} TR116
[117] {bakery,convenience,hairdresser,perfumes,photo} TR117
[118] {computer,convenience,hairdresser,photo} TR118
[119] {clothes,fashion,shoes,supermarket} TR119
[120] {clothes,fashion,shoes,supermarket} TR120
[121] {computer,fashion,shoes,supermarket} TR121
[122] {books,chemist,cobbler,newsagent,optician,shoes,supermarket} TR122
[123] {books,chemist,cobbler,convenience,newsagent,optician,shoes,supermarket} TR123
[124] {bakery,books,clothes,optician,outdoor,toys} TR124
[125] {bakery,books,newsagent,optician,outdoor,toys} TR125
[126] {bakery,books,cobbler,newsagent,optician,toys} TR126
[127] {bakery,books,cobbler,newsagent,optician,shoes} TR127
[128] {supermarket} TR128
[129] {confectionery,herbalist,kiosk} TR129
[130] {herbalist,kiosk,travel_agency} TR130
[131] {books,clothes,convenience,herbalist,travel_agency} TR131
[132] {art,convenience} TR132
[133] {art,books,convenience} TR133
[134] {books,chemist,convenience} TR134
[135] {books,chemist,outdoor} TR135
[136] {books,clothes,outdoor} TR136
[137] {clothes,computer,convenience,furniture} TR137
[138] {convenience,furniture,jewelry} TR138
[139] {clothes,confectionery,convenience,furniture,jewelry} TR139
[140] {books,clothes,outdoor,stationery} TR140
[141] {clothes,confectionery,convenience,outdoor,stationery} TR141
[142] {clothes,confectionery,convenience,jewelry,stationery} TR142
[143] {convenience,craft,wine} TR143
[144] {antiquities;coins,confectionery,convenience,shoes,wine} TR144
[145] {bakery,boutique,fabric,seafood} TR145
[146] {alcohol,bakery,boutique,convenience,craft,fabric,hairdresser,kiosk,seafood} TR146
[147] {alcohol,bakery,boutique,convenience,craft,fabric,hairdresser,seafood} TR147
[148] {convenience,craft,fabric,photo} TR148
[149] {alcohol,bakery,boutique,convenience,craft,fabric,photo,seafood} TR149
[150] {confectionery,convenience,shoes} TR150
[151] {antiques,convenience,gift,musical_instrument,pottery} TR151
[152] {antiques,gift,pottery} TR152
[153] {antiques,convenience} TR153
[154] {art,convenience,cosmetics,musical_instrument,pottery,watches} TR154
[155] {art,convenience,cosmetics,musical_instrument,second_hand,watches} TR155
[156] {art,books,boutique,kiosk,martial_arts,mobile_telephony,optician,second_hand,wine} TR156
[157] {books,cosmetics,music,watches} TR157
[158] {art,books,boutique,convenience,cosmetics,kiosk,optician,second_hand,watches} TR158
[159] {art,books,boutique,cosmetics,kiosk,mobile_telephony,optician,second_hand,watches,wine} TR159
[160] {convenience} TR160
[161] {books,gift,jewelry,watches} TR161
[162] {convenience,gift,jewelry,watches} TR162
[163] {antiques,gift,jewelry} TR163
[164] {antiques,jewelry} TR164
[165] {books} TR165
[166] {books,music} TR166
[167] {books,jewelry} TR167
[168] {books,convenience,fabric,jewelry,shoes} TR168
[169] {yes} TR169
spatialSets %>% summary()
transactions as itemMatrix in sparse format with
169 rows (elements/itemsets/transactions) and
65 columns (items) and a density of 0.05844333
most frequent items:
convenience books clothes shoes fashion (Other)
62 52 45 39 32 412
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10
29 19 37 29 19 17 7 7 4 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 3.799 5.000 10.000
includes extended item information - examples:
labels
1 alcohol
2 antiques
3 antiquities;coins
includes extended transaction information - examples:
transactionID
1 TR1
2 TR2
3 TR3
spatialSets %>% itemLabels() # takie mamy typy sklepów
[1] "alcohol" "antiques" "antiquities;coins" "art" "bakery" "beverages"
[7] "bicycle" "books" "boutique" "butcher" "chemist" "clothes"
[13] "cobbler" "computer" "confectionery" "convenience" "cosmetics" "craft"
[19] "deli" "delicatessen" "dishware" "dry_cleaning" "electronics" "erotic"
[25] "fabric" "fashion" "florist" "furniture" "gift" "hairdresser"
[31] "hardware" "herbalist" "houseware" "jewelry" "kiosk" "kitchenware"
[37] "laundry" "martial_arts" "medical_supply" "millinery" "mobile_phone" "mobile_telephony"
[43] "music" "musical_instrument" "music_store" "newsagent" "optician" "organic"
[49] "outdoor" "perfumes" "photo" "pottery" "radiotechnics" "seafood"
[55] "second_hand" "shoes" "stationery" "supermarket" "sweetshop" "ticket"
[61] "toys" "travel_agency" "watches" "wine" "yes"
spatialSets %>% itemFrequencyPlot(topN=20,cex.names=0.8)
spatialSets %>% dissimilarity(method = "affinity") %>% hclust(method="ward.D2") %>% plot
Czy istnieją jakieś związki w lokalizacji sklepów?
10/nrow(spatialSets) # jak wyznaczyć potencjalny support?
[1] 0.0591716
itemsets <- spatialSets %>% apriori(parameter=list(target="rules",support=0.06,confidence=0.7))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen target ext
0.7 0.1 1 none FALSE TRUE 5 0.06 1 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 10
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[65 item(s), 169 transaction(s)] done [0.00s].
sorting and recoding items ... [18 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 done [0.00s].
writing ... [7 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
plot(itemsets)
itemsets %>% subset(subset = support>0.06) %>% inspect
lhs rhs support confidence lift count
[1] {fashion} => {shoes} 0.14792899 0.7812500 3.385417 25
[2] {fashion} => {clothes} 0.15384615 0.8125000 3.051389 26
[3] {fashion,supermarket} => {shoes} 0.06508876 1.0000000 4.333333 11
[4] {shoes,supermarket} => {fashion} 0.06508876 0.7857143 4.149554 11
[5] {fashion,shoes} => {clothes} 0.13609467 0.9200000 3.455111 23
[6] {clothes,fashion} => {shoes} 0.13609467 0.8846154 3.833333 23
[7] {clothes,shoes} => {fashion} 0.13609467 0.9583333 5.061198 23
itemsets %>% sort(by="support") %>% head(10) %>% plot(method="graph")