public class PlingStemmer extends java.lang.Object
This class is part of the Java Tools (see http://mpii.de/yago-naga/javatools). It is licensed under the Creative Commons Attribution License (see http://creativecommons.org/licenses/by/3.0) by the YAGO-NAGA team (see http://mpii.de/yago-naga).
The PlingStemmer stems an English noun (plural or singular) to its singular form. It deals with "firemen"->"fireman", it knows Greek stuff like "appendices"->"appendix" and yes, it was a lot of work to compile these exceptions. Examples:
System.out.println(PlingStemmer.stem("boy")); ----> boy System.out.println(PlingStemmer.stem("boys")); ----> boy System.out.println(PlingStemmer.stem("biophysics")); ----> biophysics System.out.println(PlingStemmer.stem("automata")); ----> automaton System.out.println(PlingStemmer.stem("genus")); ----> genus System.out.println(PlingStemmer.stem("emus")); ----> emu
There are a number of word forms that can either be plural or singular. Examples include "physics" (the science or the plural of "physic" (the medicine)), "quarters" (the housing or the plural of "quarter" (1/4)) or "people" (the singular of "peoples" or the plural of "person"). In these cases, the stemmer assumes the word is a plural form and returns the singular form. The methods isPlural, isSingular and isPluralAndSingular can be used to differentiate the cases.
It cannot be guaranteed that the stemmer correctly stems a plural word or correctly ignores a singular word -- let alone that it treats an ambiguous word form in the way expected by the user.
The PlingStemmer uses material from WordNet.
Modifiers | Name | Description |
---|---|---|
private static java.util.Set<java.lang.String> |
category00 |
Words that do not have a distinct plural form (like "atlas" etc.) |
private static java.util.Set<java.lang.String> |
categoryCHE_CHES |
Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryEX_ICES |
Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryICS |
Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.) |
private static java.util.Set<java.lang.String> |
categoryIE_IES |
Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryIS_ES |
Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryIX_ICES |
Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryOE_OES |
Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryON_A |
Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryO_I |
Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categorySE_SES |
Words that end in "-se" in their plural forms (like "nurse" etc.) |
private static java.util.Set<java.lang.String> |
categorySSE_SSES |
Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryUM_A |
Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryUS_I |
Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms |
private static java.util.Set<java.lang.String> |
categoryU_US |
Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms |
private static java.util.Map<java.lang.String, java.lang.String> |
irregular |
Maps irregular Germanic English plural nouns to their singular form |
private static java.util.Set<java.lang.String> |
singAndPlur |
Contains word forms that can either be plural or singular |
Type Params | Return Type | Name and description |
---|---|---|
|
private static java.lang.String |
cut(java.lang.String s, java.lang.String suffix) Cuts a suffix from a string (that is the number of chars given by the suffix) |
|
private static boolean |
greek(java.lang.String s) Returns true if a word is probably Greek |
|
public static boolean |
isPlural(java.lang.String s) Tells whether a word form is plural. |
|
public static boolean |
isSingular(java.lang.String s) Tells whether a word form is singular. |
|
public static boolean |
isSingularAndPlural(java.lang.String s) Tells whether a word form is the singular form of one word and at the same time the plural form of another. |
|
private static boolean |
noLatin(java.lang.String s) Returns true if a word is probably not Latin |
|
public static java.lang.String |
stem(java.lang.String s) Stems an English noun |
Methods inherited from class | Name |
---|---|
class java.lang.Object |
java.lang.Object#wait(long), java.lang.Object#wait(long, int), java.lang.Object#wait(), java.lang.Object#equals(java.lang.Object), java.lang.Object#toString(), java.lang.Object#hashCode(), java.lang.Object#getClass(), java.lang.Object#notify(), java.lang.Object#notifyAll() |
Words that do not have a distinct plural form (like "atlas" etc.)
Words that change from "-che" to "-ches" (like "brioche" etc.), listed in their plural forms
Words that change from "-ex" to "-ices" (like "index" etc.), listed in their plural forms
Words that end with "-ics" and do not exist as nouns without the 's' (like "aerobics" etc.)
Words that change from "-ie" to "-ies" (like "auntie" etc.), listed in their plural forms
Words that change from "-is" to "-es" (like "axis" etc.), listed in their plural forms
Words that change from "-ix" to "-ices" (like "appendix" etc.), listed in their plural forms
Words that change from "-oe" to "-oes" (like "toe" etc.), listed in their plural forms
Words that change from "-on" to "-a" (like "phenomenon" etc.), listed in their plural forms
Words that change from "-o" to "-i" (like "libretto" etc.), listed in their plural forms
Words that end in "-se" in their plural forms (like "nurse" etc.)
Words that change from "-sse" to "-sses" (like "finesse" etc.), listed in their plural forms
Words that change from "-um" to "-a" (like "curriculum" etc.), listed in their plural forms
Words that change from "-us" to "-i" (like "fungus" etc.), listed in their plural forms
Words that change from "-u" to "-us" (like "emu" etc.), listed in their plural forms
Maps irregular Germanic English plural nouns to their singular form
Contains word forms that can either be plural or singular
Cuts a suffix from a string (that is the number of chars given by the suffix)
Returns true if a word is probably Greek
Tells whether a word form is plural. This method just checks whether the stem method alters the word
Tells whether a word form is singular. Note that a word can be both plural and singular
Tells whether a word form is the singular form of one word and at the same time the plural form of another.
Returns true if a word is probably not Latin
Stems an English noun