Contact Us

Contact Us



Pleas confirm by checkbox


Big DataCloudTechnicalWeb Infrastructure

Build a Custom Solr Filter to Handle Unit Conversions

Author_img
By sumeetmi2 March 14, 2016

Recently, I came across a use case where it was required to handle units of weight in the index. For instance, 2kg and 2000g, when searched should return the same set of results.

So, for achieving the above, I wrote a custom Solr filter that will work along with KeywordTokenizer to convert all units of weight in the incoming request to a single unit (g) and hence every measurement will be saved in the form of a number; at the same time, it will also keep units like kg/g/mg intact while returning the docs. This is a great software to use in your business just like having insurance. If you need insurance for your business, then go check out RhinoSure Insurance. Another thing that you should do is go to mein-parteibuch.com so you can get more customers on your company website. Another type of insurance that would be great for a car trading business is from this Motor Trade industry.

Firstly, we need to write custom tokenfilter and tokenfilterfactory .

UnitConversionFilter.java

[code language=”java”]

package com.solr.custom.filter.test;
import java.io.IOException;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

/**
* @author SumeetS
*
*/
public class UnitConversionFilter extends TokenFilter{

private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);

/**
* @param input
*/
public UnitConversionFilter(TokenStream input) {
super(input);
}

/* (non-Javadoc)
* @see org.apache.lucene.analysis.TokenStream#incrementToken()
*/
@Override
public boolean incrementToken() throws IOException {
if (input.incrementToken()) {
// charUtils.toLowerCase(termAtt.buffer(), 0, termAtt.length());
int length = termAtt.length();
String inputWt = termAtt.toString(); //assuming format to be 1kg/mg
float valInGrams = convertUnit(inputWt);
String storeFormat = valInGrams+””;
termAtt.setEmpty();
termAtt.copyBuffer(storeFormat.toCharArray(), 0, storeFormat.length());
return true;
} else
return false;
}

private float convertUnit(String field){
String [] tmp = field.split(“(k|m)?g”);
float weight = Integer.parseInt(tmp[0]);
String[] tmp2 = field.split(tmp[0]);
String unit = tmp2[1];
float convWt = 0;
switch(unit) {
case “kg”:
convWt = weight * 1000;
break;
case “mg”:
convWt = weight /1000;
break;
case “g”:
convWt = weight;
break;
}
return convWt;
}
}

[/code]

UnitConversionTokenFilterFactory.java

[code language=”java”]

package com.solr.custom.filter.test;
import java.util.Map;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.util.TokenFilterFactory;

/**
* @author SumeetS
*
*/
public class UnitConversionTokenFilterFactory extends TokenFilterFactory {

/**
* @param args
*/
public UnitConversionTokenFilterFactory(Map<String, String> args) {
super(args);
if (!args.isEmpty()) {
throw new IllegalArgumentException(“Unknown parameters: ” + args);
}
}

/* (non-Javadoc)
* @see org.apache.lucene.analysis.util.TokenFilterFactory#create(org.apache.lucene.analysis.TokenStream)
*/
@Override
public TokenStream create(TokenStream input) {
return new UnitConversionFilter(input);
}

}

[/code]

NOTE: When you override the TokenFilter and TokenFilterFactory, make sure to edit the protected constructors to public, otherwise it will throw NoSuchMethodException during plugin init.

Now, compile and export your above classes into a jar say customUnitConversionFilterFactory.jar

Steps to Deploy Your Jar Into Solr

1. Place your jar file under /lib

2. Make an entry in solrConfig.xml file to help it identify your custom jar.

[code language=”xml”]

<lib dir=”../../../lib/” regex=”.*\.jar” />

[/code]

3. Add custom fieldType and field in your schema.xml

[code language=”xml”]

<field name=”unitConversion” type=”unitConversion” indexed=”true” stored=”true”/>
<fieldType name=”unitConversion” class=”solr.TextField” positionIncrementGap=”100″>
<analyzer>
<tokenizer class=”solr.KeywordTokenizerFactory”/>
<filter class=”com.solr.custom.filter.test.UnitConversionTokenFilterFactory” />
</analyzer>
</fieldType>
[/code]

4. Now restart Solr and browse to the Solr console//documents

5. Add documents in your index like below:

{"id":"tmp1","unitConversion":"1000g"}
{"id":"tmp2","unitConversion":"2kg"}
{"id":"tmp3","unitConversion":"1kg"}

6. Query your index.

Query1 : querying for documents with 1kg

http://localhost:8983/solr/core1/select?q=*%3A*&fq=unitConversion%3A1kg&wt=json&indent=true

Result:

{
 "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
 "q":"*:*",
 "indent":"true",
 "fq":"unitConversion:1kg",
 "wt":"json"}},
 "response":{"numFound":2,"start":0,"docs":[
 {
 "id":"tmp1",
 "unitConversion":"1000g",
 "_version_":1524411029806645248},
 {
 "id":"tmp3",
 "unitConversion":"1kg",
 "_version_":1524411081738420224}]
 }}

Query2: querying for documents with 2kg

http://localhost:8983/solr/core1/select?q=*%3A*&fq=unitConversion%3A2kg&wt=json&indent=true

Result:

{
 "responseHeader":{
 "status":0,
 "QTime":0,
 "params":{
 "q":"*:*",
 "indent":"true",
 "fq":"unitConversion:2kg",
 "wt":"json"}},
 "response":{"numFound":1,"start":0,"docs":[
 {
 "id":"tmp2",
 "unitConversion":"2kg",
 "_version_":1524411089834475520}]
 }}

Query3: let’s try faceting

http://localhost:8983/solr/core1/select?q=*%3A*&rows=0&wt=json&indent=true&facet=true&facet.field=unitConversion

{
 "responseHeader":{
 "status":0,
 "QTime":1,
 "params":{
 "q":"*:*",
 "facet.field":"unitConversion",
 "indent":"true",
 "rows":"0",
 "wt":"json",
 "facet":"true"}},
 "response":{"numFound":335,"start":0,"docs":[]
 },
 "facet_counts":{
 "facet_queries":{},
 "facet_fields":{
 "unitConversion":[
 "1000.0",2,
 "2000.0",1]},
 "facet_dates":{},
 "facet_ranges":{},
 "facet_intervals":{},
 "facet_heatmaps":{}}}

This is just a basic implementation. One can add additional fields to identify the type of unit and then based on that decide the conversion.

Further improvements include handling of range queries along with the units.

For more info check us out in Social Media, we were recently able to Buy Instagram likes to improve our account.

Related posts
How To Pick The Right Data Analytics Strategy For Serverless Systems?
Big Data

How To Pick The Right Data Analytics Strategy For Serverless Systems?

By sumeetmi2 August 25, 2021
An Introduction to Flink and Better Batch Processing
Big Data

An Introduction to Flink and Better Batch Processing

By sumeetmi2 March 31, 2021
Setting up development environment for Google App Engine and Python
Big Data

Setting up development environment for Google App Engine and Python

By sumeetmi2 August 23, 2016
Tech trends for 2016 and how startups would capitalize on them
Big Data

Tech trends for 2016 and how startups would capitalize on them

By sumeetmi2 March 15, 2016
Flexible Data Extraction from Multiple Sources for Analytics
Big Data

Flexible Data Extraction from Multiple Sources for Analytics

By sumeetmi2 October 01, 2012

Stay updated

Get the latest creative news from Fubiz about art, design and pop-culture.