`
linest
  • 浏览: 150760 次
  • 性别: Icon_minigender_1
  • 来自: 内蒙古
社区版块
存档分类
最新评论

读代码-InputMapper

 
阅读更多
package org.apache.mahout.clustering.conversion;
目的:读取输入转换成vector输出

private static final Pattern SPACE = Pattern.compile(" ");
private Constructor<?> constructor;


用反射加载vector类
Configuration conf = context.getConfiguration();
String vectorImplClassName = conf.get("vector.implementation.class.name");
Class<? extends Vector> outputClass = conf.getClassByName(vectorImplClassName).asSubclass(Vector.class);
      constructor = outputClass.getConstructor(int.class);



正则按空格拆开,装入double链表,处理了连续空格情况
String[] numbers = InputMapper.SPACE.split(values.toString());    

Collection<Double> doubles = new ArrayList<Double>();
    for (String value : numbers) {
      if (value.length() > 0) {
        doubles.add(Double.valueOf(value));
      }
    }


建立vector,用writeable包装,key为vector的长度,value为vector写出
Vector result = (Vector) constructor.newInstance(doubles.size());
        int index = 0;
        for (Double d : doubles) {
          result.set(index++, d);
        }
        VectorWritable vectorWritable = new VectorWritable(result);
        context.write(new Text(String.valueOf(index)), vectorWritable);


分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics