| 
 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectmorfologik.fsa.FSA
morfologik.fsa.CFSA
public final class CFSA
CFSA (Compact Finite State Automaton) binary format implementation. This is a
 slightly reorganized version of FSA5 offering smaller automata size
 at some (minor) performance penalty.
 
Note: Serialize to CFSA2 for new code.
The encoding of automaton body is as follows.
 ---- FSA header (standard)
 Byte                            Description 
       +-+-+-+-+-+-+-+-+\
     0 | | | | | | | | | +------ '\'
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     1 | | | | | | | | | +------ 'f'
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     2 | | | | | | | | | +------ 's'
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     3 | | | | | | | | | +------ 'a'
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     4 | | | | | | | | | +------ version (fixed 0xc5)
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     5 | | | | | | | | | +------ filler character
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     6 | | | | | | | | | +------ annot character
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     7 |C|C|C|C|G|G|G|G| +------ C - node data size (ctl), G - address size (gotoLength)
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
  8-32 | | | | | | | | | +------ labels mapped for type (1) of arc encoding. 
       : : : : : : : : : |
       +-+-+-+-+-+-+-+-+/
 
 ---- Start of a node; only if automaton was compiled with NUMBERS option.
 
 Byte
        +-+-+-+-+-+-+-+-+\
      0 | | | | | | | | | \  LSB
        +-+-+-+-+-+-+-+-+  +
      1 | | | | | | | | |  |      number of strings recognized
        +-+-+-+-+-+-+-+-+  +----- by the automaton starting
        : : : : : : : : :  |      from this node.
        +-+-+-+-+-+-+-+-+  +
  ctl-1 | | | | | | | | | /  MSB
        +-+-+-+-+-+-+-+-+/
        
 ---- A vector of node's arcs. Conditional format, depending on flags.
 
 1) NEXT bit set, mapped arc label. 
 
                +--------------- arc's label mapped in M bits if M's field value > 0
                | +------------- node pointed to is next
                | | +----------- the last arc of the node
         _______| | | +--------- the arc is final
        /       | | | |
       +-+-+-+-+-+-+-+-+\
     0 |M|M|M|M|M|1|L|F| +------ flags + (M) index of the mapped label.
       +-+-+-+-+-+-+-+-+/
 
 2) NEXT bit set, label separate.
 
                +--------------- arc's label stored separately (M's field is zero).
                | +------------- node pointed to is next
                | | +----------- the last arc of the node
                | | | +--------- the arc is final
                | | | |
       +-+-+-+-+-+-+-+-+\
     0 |0|0|0|0|0|1|L|F| +------ flags
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     1 | | | | | | | | | +------ label
       +-+-+-+-+-+-+-+-+/
 
 3) NEXT bit not set. Full arc.
 
                  +------------- node pointed to is next
                  | +----------- the last arc of the node
                  | | +--------- the arc is final
                  | | |
       +-+-+-+-+-+-+-+-+\
     0 |A|A|A|A|A|0|L|F| +------ flags + (A) address field, lower bits
       +-+-+-+-+-+-+-+-+/
       +-+-+-+-+-+-+-+-+\
     1 | | | | | | | | | +------ label
       +-+-+-+-+-+-+-+-+/
       : : : : : : : : :       
       +-+-+-+-+-+-+-+-+\
 gtl-1 |A|A|A|A|A|A|A|A| +------ address, continuation (MSB)
       +-+-+-+-+-+-+-+-+/
 
| Field Summary | |
|---|---|
|  byte[] | arcsAn array of bytes with the internal representation of the automaton. | 
| static int | BIT_FINAL_ARCBitmask indicating that an arc corresponds to the last character of a sequence available when building the automaton. | 
| static int | BIT_LAST_ARCBitmask indicating that an arc is the last one of the node's list and the following one belongs to another node. | 
| static int | BIT_TARGET_NEXTBitmask indicating that the target node of this arc follows it in the compressed automaton structure (no goto field). | 
|  int | gtlNumber of bytes each address takes in full, expanded form (goto length). | 
|  byte[] | labelMappingLabel mapping for arcs of type (1) (see class documentation). | 
|  int | nodeDataLengthThe length of the node header structure (if the automaton was compiled with NUMBERSoption). | 
| static byte | VERSIONAutomaton header version value. | 
| Constructor Summary | |
|---|---|
| CFSA(java.io.InputStream fsaStream)Creates a new automaton, reading it from a file in FSA format, version 5. | |
| Method Summary | |
|---|---|
|  int | getArc(int node,
       byte label) | 
|  byte | getArcLabel(int arc)Return the label associated with a given arc. | 
|  int | getEndNode(int arc)Return the end node pointed to by a given arc. | 
|  int | getFirstArc(int node) | 
|  java.util.Set<FSAFlags> | getFlags()Returns a set of flags for this FSA instance. | 
|  int | getNextArc(int arc) | 
|  int | getRightLanguageCount(int node) | 
|  int | getRootNode()Returns the start node of this automaton. | 
|  boolean | isArcFinal(int arc)Returns trueif the destination node at the end of thisarccorresponds to an input sequence created when building
 this automaton. | 
|  boolean | isArcLast(int arc)Returns trueif this arc hasNEXTbit set. | 
|  boolean | isArcTerminal(int arc)Returns trueif thisarcdoes not have a
 terminating node (@linkFSA.getEndNode(int)will throw an
 exception). | 
|  boolean | isLabelCompressed(int arc)Returns trueif the label is compressed inside flags byte. | 
|  boolean | isNextSet(int arc) | 
| Methods inherited from class morfologik.fsa.FSA | 
|---|
| getArcCount, getSequences, getSequences, iterator, read, visitAllStates, visitInPostOrder, visitInPostOrder, visitInPreOrder, visitInPreOrder | 
| Methods inherited from class java.lang.Object | 
|---|
| clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
| Field Detail | 
|---|
public static final byte VERSION
public static final int BIT_FINAL_ARC
public static final int BIT_LAST_ARC
public static final int BIT_TARGET_NEXT
public byte[] arcs
public final int nodeDataLength
NUMBERS option). Otherwise zero.
public final int gtl
public final byte[] labelMapping
| Constructor Detail | 
|---|
public CFSA(java.io.InputStream fsaStream)
     throws java.io.IOException
java.io.IOException| Method Detail | 
|---|
public int getRootNode()
0 if
 the start node is also an end node.
getRootNode in class FSApublic final int getFirstArc(int node)
getFirstArc in class FSAnode
         or 0 if the node has no outgoing arcs.public final int getNextArc(int arc)
getNextArc in class FSAarc and
         leaving node. Zero is returned if no more arcs are
         available for the node.
public int getArc(int node,
                  byte label)
getArc in class FSAnode and
         labeled with label. An identifier equal to 0 means
         the node has no outgoing arc labeled label.public int getEndNode(int arc)
arc. Terminal arcs
 (those that point to a terminal state) have no end node representation
 and throw a runtime exception.
getEndNode in class FSApublic byte getArcLabel(int arc)
arc.
getArcLabel in class FSApublic int getRightLanguageCount(int node)
getRightLanguageCount in class FSAFSAFlags.NUMBERS. The size of
 the right language of the state, in other words.public boolean isArcFinal(int arc)
true if the destination node at the end of this
 arc corresponds to an input sequence created when building
 this automaton.
isArcFinal in class FSApublic boolean isArcTerminal(int arc)
true if this arc does not have a
 terminating node (@link FSA.getEndNode(int) will throw an
 exception). Implies FSA.isArcFinal(int).
isArcTerminal in class FSApublic boolean isArcLast(int arc)
true if this arc has NEXT bit set.
BIT_LAST_ARCpublic boolean isNextSet(int arc)
BIT_TARGET_NEXTpublic boolean isLabelCompressed(int arc)
true if the label is compressed inside flags byte.
public java.util.Set<FSAFlags> getFlags()
For this automaton version, an additional FSAFlags.NUMBERS flag
 may be set to indicate the automaton contains extra fields for each node.
getFlags in class FSA| 
 | |||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||