[an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive] (none) [an error occurred while processing this directive] [an error occurred while processing this directive] [an error occurred while processing this directive][an error occurred while processing this directive]
 
[an error occurred while processing this directive] [an error occurred while processing this directive]
Skåne Sjælland Linux User Group - http://www.sslug.dk Home   Subscribe   Mail Archive   Forum   Calendar   Search
MhonArc Date: [Date Prev] [Date Index] [Date Next]   Thread: [Date Prev] [Thread Index] [Date Next]   MhonArc
 

Re: [LOCALE] Language Info Needed for Aspell (fwd)



Hi Kevin

We have discussed your section on 'Compound Words' on our
newsgroup/mailing list for locale matters. Here is the result.

The set of formal rules is probably not complete but they should cover
most cases.

.Henrik

Kevin Atkinson wrote:

> Compound Words
> ==============
>
> In some languages, such as German, it is acceptable to string two
> words together, thus forming a compound word.  However, there are
> rules to when this can be done.  Furthermore, it is not always
> sufficient to simply concatenate the two words.  For example,
> sometimes a letter is inserted between the two words.  I tried
> implementing support for compound words in Aspell but it was too
> limiting and no one used it.  Before I try implementing it again I
> want to know all the issues involved.

This section could benefit from a reformulation and extension.

In some languages, such as German, words can be created by combining
already existing words.  Since it in most cases is incorrect to write
such a compound word in its separate parts, it is important that a spell
checker is aware of these rules.  Compound words are in general not
simply created by joining to valid spellings together.  In the case of
the Germanic languages the compound words are created by joining a
special compound form of the first word (which might be identical to
some regular form) with any conjugation of the second word, thus
creating a word of the same class and conjugation as the second word -
with the exception that in some cases where the first word is a proper
name, the compound is also a proper name.

A problem which should be ignored is that not all compounds are equally
meaningful, most compounds do have some potential meaning though, and
should thus be accepted. An example is the potentially danish word
'radiocykel' a compound of 'radio' (radio) and 'cykel' (bicycle), which
might seem meaningless, but as the compound is completely analogous to
'radiobil' (bumper car) all it takes for the compound to be useable is a
little engineering. Furthermore, if two words can form a compound, it's
much more likely that they should, than that they shouldn't.

More than one compound form of words [da, fo, se]
-------------------------------------------------

For some words there are more than one form, which is valid as the
first part of a compound word.  Since these forms of the words are not
necessarily themselves valid words, we need to be able to distinguish
between stand-alone forms words and compound forms of words in the
word lists.

An example of this is the Swedish word "gata", which in compound words
is spelled either "gat" or "gatu" (in most cases interchangeable),
neither of which are words.

It's not always the case that two compound forms can be used
interchangeably. In most cases it's completely random which compound
form is used in a given compound, and in all cases this reduces to the
question of meaning, which we have already said should be ignored.

Compounds of compounds [da, fo, se]
-----------------------------------

Since a compound word is also a word, it can again be used to form a
new and even longer compound word.

An example of this is the Danish word "bananrepublikpræsident", which
is composed of the words "bananrepublik" and "præsident", and where
"bananrepublik" again is a compound word.

We are unable to find examples, where a difference in the order words
are joined in ("bananrepublik" + "præsident" vs. "banan" +
"republikpræsident") is important for the spelling. But it still may
be the case in some languages.

The thing to notice is that all but the last word will be in their
compound forms. 

Compounds of proper names and other words [da]
----------------------------------------------

If the compound is a proper name, it is spelled with a capital first
letter.  If it is not a name, it is optional if the first letter is
capitalized or not.  In addition to this general rule, there are cases,
where it is incorrect to capitalize the first letter.

Notice that the proper name used in the compound can be both the first
and the second part of the compound word.

Hyphens [da]
------------

In danish it is allowed to add a hyphen between the components of a
compound word for clarity.

Hyphens are always used in compounds where one part is an abbreviation.

Hyphens are always used in compounds of words (not in their compound form) of
equal priority i.e. 'rød-hvide' (red/white).

In the case of compounds of compounds, if a hyphen is used in one of the
components, a hyphen must be used in forming the compound.

Formal rules [da]
-----------------

Most compound words are produced by rules like these:

  <noun>      <- <compound form of noun> + <noun>
  <noun>      <- <compound form of adjective> + <noun>
  <noun>      <- <adverb> + <noun>
  <adjective> <- <compound form of noun> + <adjective>
  <adjective> <- <compound form of adjective> + <adjective>
  <name>      <- <compound form of name> + <noun>
  <name>      <- <compound form of adjective> + <name>

  <hyphenated noun> <- <abbreviation> + "-" + <noun>
  <hyphenated noun> <- <compound form of hyphenated noun> + "-" + <noun>

Since the rules may differ slightly between languages, it may be a
good idea to implement a grammar for specifying how compound words are
created in each language.

User adaptability
-----------------

Since some writers are stronger than others in the handling of
compound words, it may be relevant to allow the users to switch the
automated compound word rules off (or on).  The default should still
be set in the configuration file for the dictionary.

-- 
Den største fordel ved Windows er den gode understøttelse af
Windows-programmer.
                          -- citat Niels Andersen (i dk.edb.system.unix)


 
Home   Subscribe   Mail Archive   Index   Calendar   Search

 
 
Questions about the web-pages to <www_admin>. Last modified 2005-08-10, 20:54 CEST [an error occurred while processing this directive]
This page is maintained by [an error occurred while processing this directive]MHonArc [an error occurred while processing this directive] # [an error occurred while processing this directive] *