dwww: tpablo.net

dwww Home | Show directory contents | Find package
Quickstart
==========

This is a very brief introduction to the use of PyStemmer.

First, import the library:

>>> import Stemmer

Just for show, we'll display a list of the available stemming algorithms:

>>> print(Stemmer.algorithms())
[u'arabic', u'armenian', u'basque', u'catalan', u'danish', u'dutch', u'english', u'finnish', u'french', u'german', u'greek', u'hindi', u'hungarian', u'indonesian', u'irish', u'italian', u'lithuanian', u'nepali', u'norwegian', u'porter', u'portuguese', u'romanian', u'russian', u'serbian', u'spanish', u'swedish', u'tamil', u'turkish', u'yiddish']

Now, we'll get an instance of the english stemming algorithm:

>>> stemmer = Stemmer.Stemmer('english')

Stem a single word:

>>> print(stemmer.stemWord('cycling'))
cycl

Stem a list of words:

>>> print(stemmer.stemWords(['cycling', 'cyclist']))
['cycl', 'cyclist']

Strings which are supplied are assumed to be UTF-8 encoded.
We can use unicode input, too:

>>> print(stemmer.stemWords(['cycling', u'cyclist']))
['cycl', u'cyclist']

Each instance of the stemming algorithms uses a cache to speed up processing of
common words.  By default, the cache holds 10000 words, but this may be
modified.  The cache may be disabled entirely by setting the cache size to 0:

>>> print(stemmer.maxCacheSize)
10000

>>> stemmer.maxCacheSize = 1000

>>> print(stemmer.maxCacheSize)
1000
Generated by dwww version 1.14 on Sat Jan 25 11:13:29 CET 2025.