package org.apache.mahout.fpm.pfpgrowth.fpgrowth;
public class Pattern implements Comparable<Pattern>
pattern封装了一组item,每个item的support值,整体的support值
private int[] pattern;
private long support = Long.MAX_VALUE;
private long[] supportValues;
判断pattern是否包含。由于pattern已经是升序排列,两个数组比较推进即可。
包含有几个个条件:1.本数组长度要小外来 2.如果外来数组已经大于本数组当前值则不可能3.要么两个数组都达到末尾,要么外来数组没有到末尾
public final boolean isSubPatternOf(Pattern frequentPattern) {
int[] otherPattern = frequentPattern.getPattern();
int otherLength = frequentPattern.length();
if (this.length() > frequentPattern.length()) {
return false;
}
int i = 0;
int otherI = 0;
while (i < length && otherI < otherLength) {
if (otherPattern[otherI] == pattern[i]) {
otherI++;
i++;
} else if (otherPattern[otherI] < pattern[i]) {
otherI++;
} else {
return false;
}
}
return otherI != otherLength || i == length;
}
扩容。增大1.5倍再复制之前的。
private void resize() {
int size = (int) (GROWTH_RATE * length);
if (size < DEFAULT_INITIAL_SIZE) {
size = DEFAULT_INITIAL_SIZE;
}
int[] oldpattern = pattern;
long[] oldSupport = supportValues;
this.pattern = new int[size];
this.supportValues = new long[size];
System.arraycopy(oldpattern, 0, this.pattern, 0, length);
System.arraycopy(oldSupport, 0, this.supportValues, 0, length);
}
增加一个item,总support取所有item最小值
public final void add(int id, long supportCount) {
dirty = true;
if (length >= pattern.length) {
resize();
}
this.pattern[length] = id;
this.supportValues[length++] = supportCount;
this.support = supportCount > this.support ? this.support : supportCount;
}
package org.apache.mahout.fpm.pfpgrowth.fpgrowth;
public final class FrequentPatternMaxHeap
在堆中保存top k个pattern
核心结构
private final PriorityQueue<Pattern> queue;
同一support值的pattern组成一个set
private final OpenLongObjectHashMap<Set<Pattern>> patternIndex;
判断是否可添加
如果容量未满直接可添加,如果已满,若support值大于当前最小也可以添加
public boolean addable(long support) {
return count < maxSize || least.support() <= support;
}
插入新pattern,如果堆满了,存在替换过程,如果有子pattern检查还要维护同support的集合。insert函数主要进行插入前的条件筛选,主要逻辑。
public void insert(Pattern frequentPattern) {
if (frequentPattern.length() == 0) {
return;
}
if (count == maxSize) {
if (frequentPattern.compareTo(least) > 0 && addPattern(frequentPattern)) {
Pattern evictedItem = queue.poll();
least = queue.peek();
if (subPatternCheck) {
patternIndex.get(evictedItem.support()).remove(evictedItem);
}
}
} else {
if (addPattern(frequentPattern)) {
count++;
if (least == null) {
least = frequentPattern;
} else {
if (least.compareTo(frequentPattern) < 0) {
least = frequentPattern;
}
}
}
}
}
具体的插入过程。检查子集,如果堆中已经有超集则不插入。
private boolean addPattern(Pattern frequentPattern) {
if (subPatternCheck) {
Long index = frequentPattern.support();
if (patternIndex.containsKey(index)) {
Set<Pattern> indexSet = patternIndex.get(index);
boolean replace = false;
Pattern replacablePattern = null;
for (Pattern p : indexSet) {
if (frequentPattern.isSubPatternOf(p)) {
return false;
} else if (p.isSubPatternOf(frequentPattern)) {
replace = true;
replacablePattern = p;
break;
}
}
if (replace) {
indexSet.remove(replacablePattern);
if (!indexSet.contains(frequentPattern) && queue.add(frequentPattern)) {
indexSet.add(frequentPattern);
}
return false;
}
queue.add(frequentPattern);
indexSet.add(frequentPattern);
} else {
queue.add(frequentPattern);
Set<Pattern> patternList;
if (!patternIndex.containsKey(index)) {
patternList = new HashSet<Pattern>();
patternIndex.put(index, patternList);
}
patternList = patternIndex.get(index);
patternList.add(frequentPattern);
}
} else {
queue.add(frequentPattern);
}
return true;
}
分享到:
相关推荐
url-pattern的3种写法url-pattern的3种写法
前端开源库-url-patternurl模式,比regex字符串更容易匹配url和其他字符串的模式。将字符串转换为数据或将数据转换为字符串。
PatternLock A Material Design Pattern Lock library with auth flow implementation. Why PatterLock? Battle-tested framework implementation with necessary modifications. Supports XML attributes for ...
JAVA正则表达式--Pattern和Matcher 现在JDK1.4里终于有了自己的正则表达式API包,JAVA程序员可以免去找第三方提供的正则表达式库的周折了,我们现在就马上来了解一下这个SUN提供的迟来恩物- -对我来说确实如此。...
design-pattern-java.pdf
hello-design-pattern Hello world using all 23 kinds of GoF design patterns. public class Main { public static void main(String[] args) throws InstantiationException, IllegalAccessException { //...
Pattern recognition has its origins in engineering, whereas machine learning grew out of computer science. However, these activities can be viewed as two facets of the same field, and together they ...
Factory Method Pattern 工厂三兄弟之工厂方法模式(一) 工厂三兄弟之工厂方法模式(二) 工厂三兄弟之工厂方法模式(三) 工厂三兄弟之工厂方法模式(四) 抽象工厂模式-Abstract Factory Pattern 工厂三兄弟之...
对Servlet技术中的web.xml部署,进行深入解析其中的url-pattern.
对应上传的https://blog.csdn.net/qq_40144558/article/details/88065484中的23种涉及模式,以及各模式中的拓展
Adaptive fringe-pattern__ projection for image saturation avoidance in 3D surface-shape__ measurement.pdf
前端开源库-stylelint-selector-bem-patternStylelint选择器BEM Pattern,一个利用PostSS BEM Linter功能的Stylelint插件
我的文章:http://blog.csdn.net/pengdongneng/article/details/66973587 的测试源码
个人觉得这本书介绍的设计模式知识比较通俗易懂
Microrservices-design -pattern, 微服务设计模式, 微服务设计模式
Design pattern In JavaScrip-design-pattern-in-javascript
head-first-design-pattern—02-observer-pattern(观察者模式),融入了个人的见解,里面包含错误的实现和正确的标准实现,你可以对比学习,加深对模式的理解
Mummford-Pattern-Theory-the-Stochastic-Analysis-of-Real-World-Signals
前端开源库-path-to-glob-pattern路径到全局模式,将文件/目录路径转换为全局模式。
Laravel开发-laravel-pattern-generator php laravel-模式生成器