Analysing email addresses with Hibernate Search & Solr

October 15th, 2009

On first appearance WordDelimiterFilterFactory seems like the most appropriate solution to the problem. It splits words into sub words on intra-word delimiters.

So:

  • “email@someserver.com” -> “email”, “someserver”, “com”

This works well except for the fact that it splits on all intra-word delimiters, and when combined with the StandardAnalyzer splits on letter-number transitions.

So:

  • “email@some-server.com” -> “email”, “some”, “server”, “com”
  • “email@server5.com” -> “email”, “server”, “5″, “com”

Which is fine unless your users want to search for “server5″ say or “some-server” (without analysing the search query itself).

And so the strategy I’ve taken is as follows,

  1. Use the PatternTokenizerFactory and split on “.” and “@”
  2. Filter to lower case using LowerCaseFilterFactory
  3. Store the full email address in a separate field

Which now means that:

  • “email@some-server.com” -> “email”, “some-server”, “com”
  • “email@server5.com” -> “email”, “server5″, “com”

Searches for “server5″ and “some-server” are now found.

There is naturally some room for improvement for example what if the user searches for “server” and “5″, they would reasonably expect anything that matched “server5″ to be returned. At the moment I’m handling this by allowing wildcard searches so “server*” does the trick. It may need revisiting, but only time will tell…

In case you’re wondering how that gets put together here’s a source snippet:

@Entity
@Indexed
@Table(name = "user", catalog = "somedb")
@AnalyzerDef(
  name = "email",
  tokenizer = @TokenizerDef(
    factory = PatternTokenizerFactory.class, params = {
      @Parameter(name = "pattern", value = "\\.|\\@")
    }),
    filters = {
      @TokenFilterDef(factory = LowerCaseFilterFactory.class)
    })
public class User implements java.io.Serializable {
...
    @Column(name = "email", nullable = false, unique = true)
    @Fields( {
      @Field(name = "fullEmail", index = Index.UN_TOKENIZED, store = Store.YES),
      @Field(index = Index.TOKENIZED, analyzer = @Analyzer(definition = "email"), store = Store.YES)
    })
    public String getEmail() {
      return this.email;
    }
...
}

Additional resources :

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

HOWTO: Remove GWT package from url when using gwt-maven

March 13th, 2009

Seriously, who wants a url like /my.long.package.name.Application/Application.html

To remedy this use the following steps:

  1. Make the webapp directory the same as the output for your gwt code.
    ...
    <plugin>
    	<groupId>org.apache.maven.plugins</groupId>
    	<artifactId>maven-war-plugin</artifactId>
    	<configuration>
    		<webappDirectory>${project.build.directory}/${project.build.finalName}/my.long.package.name.Application	</webappDirectory>
    	</configuration>
    </plugin>
    ...
  2. Ensure that your rpc servlets don’t have the package as a prefix.
    ...
    <plugin>
    	<groupId>com.totsp.gwt</groupId>
    	<artifactId>maven-googlewebtoolkit2-plugin</artifactId>
    	...
    	<configuration>
    		...
    		<webXmlServletPathAsIs>true</webXmlServletPathAsIs>
    	</configuration>
    	...
    </plugin>
    ...
  3. Change your index.html file to point to your app.
    From:

    <meta http-equiv="REFRESH" content="0;url=my.long.package.name.Application/Application.html">

    To:

    <meta http-equiv="REFRESH" content="0;url=Application.html">
  4. Your done.

Additional resources:
http://groups.google.com/group/Google-Web-Toolkit/browse_thread/thread/f8b06676098b8cc6

http://groups.google.com/group/gwt-maven/browse_thread/thread/a46f540ca823e3d3/7d5febf0776958db?lnk=gst&q=rename#7d5febf0776958db

HOWTO: Run your java web app as the root context on Tomcat

March 13th, 2009

If you have a context XML fragment in your war (META-INF/context.xml) then it’s as easy as renaming your war file to ROOT.war

If you’re using the maven build system then just alter your pom to have:

<build>
	<finalName>ROOT</finalName>
	...
</build>

Simple when you know how…