Class MatchStarTables


  • public class MatchStarTables
    extends java.lang.Object
    Provides factory methods for producing tables which represent the result of row matching.

    The methods in this class operate on Collection<RowLink>s rather than on LinkSets, to emphasise that they do not modify the contents of the collections. Such collections will typically be sorted into their natural sequence, see orderLinks(uk.ac.starlink.table.join.LinkSet).

    Author:
    Mark Taylor (Starlink)
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static uk.ac.starlink.table.ValueInfo GRP_ID_INFO
      Defines the characteristics of a table column which represents the ID of a group of matched row objects.
      static uk.ac.starlink.table.ValueInfo GRP_SIZE_INFO
      Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
    • Constructor Summary

      Constructors 
      Constructor Description
      MatchStarTables()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.util.Map<RowLink,​LinkGroup> findGroups​(java.util.Collection<RowLink> links)
      Returns a mapping from RowLinks to LinkGroups which describes connected groups of links in the input collection.
      static uk.ac.starlink.table.StarTable makeInternalMatchTable​(int iTable, java.util.Collection<RowLink> rowLinks, long rowCount)
      Analyses a set of RowLinks to mark as linked rows of a given table.
      static uk.ac.starlink.table.StarTable makeJoinTable​(uk.ac.starlink.table.StarTable[] tables, java.util.Collection<RowLink> rowLinks, boolean addGroups, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo)
      Constructs a table made out of a set of constituent tables joined together according to a set of RowLinks describing row matches.
      static uk.ac.starlink.table.StarTable makeSequentialJoinTable​(uk.ac.starlink.table.StarTable[] tables, java.util.Collection<RowLink> rowLinks, uk.ac.starlink.table.JoinFixAction[] fixActs, uk.ac.starlink.table.ValueInfo matchScoreInfo)
      Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a RowLink collection.
      static java.util.Collection<RowLink> orderLinks​(LinkSet linkSet)
      Best-efforts Conversion of a LinkSet, which is what RowMatcher outputs, to a Collection of RowLinks, which is what's used by this class.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • GRP_ID_INFO

        public static final uk.ac.starlink.table.ValueInfo GRP_ID_INFO
        Defines the characteristics of a table column which represents the ID of a group of matched row objects.
      • GRP_SIZE_INFO

        public static final uk.ac.starlink.table.ValueInfo GRP_SIZE_INFO
        Defines the characteristics of a table column which represents the number of matched row objects in a given group (with the same group ID).
    • Constructor Detail

      • MatchStarTables

        public MatchStarTables()
    • Method Detail

      • makeJoinTable

        public static uk.ac.starlink.table.StarTable makeJoinTable​(uk.ac.starlink.table.StarTable[] tables,
                                                                   java.util.Collection<RowLink> rowLinks,
                                                                   boolean addGroups,
                                                                   uk.ac.starlink.table.JoinFixAction[] fixActs,
                                                                   uk.ac.starlink.table.ValueInfo matchScoreInfo)
        Constructs a table made out of a set of constituent tables joined together according to a set of RowLinks describing row matches. The columns of the resulting table are made by appending the columns of the constituent tables side by side. Each row in the resulting table corresponds to one RowLink entry in a set rowLinks; if that RowLink contains a row from one of the tables being joined here, the columns corresponding to that table are filled in. If it contains multiple rows from that table, an arbitrary one of them is filled in.

        The tables array determines which tables columns appear in the output table. It must have (at least) as many elements as the highest table index in the RowLink set. Table data will be picked from the n'th table in this array for RowRef elements with a tableIndex of n. If the nth element is null, the corresponding columns will not appear in the output table.

        The matchScoreInfo parameter is optional. If it is non-null, then an additional column, described by matchScoreInfo, will be added to the table containing the score values from the RowLinks in links. The content class of matchScoreInfo should be Number or one of its subclasses.

        Parameters:
        tables - array of constituent tables
        rowLinks - set of RowLink objects which define which rows in one table are associated with which rows in the others
        addGroups - flag which indicates whether the output table should, if appropriate, include GRP_ID_INFO and GRP_SIZE_INFO columns
        fixActs - actions to take for deduplicating column names (array of the same length as tables)
        matchScoreInfo - may supply information about the meaning of the link scores
      • makeSequentialJoinTable

        public static uk.ac.starlink.table.StarTable makeSequentialJoinTable​(uk.ac.starlink.table.StarTable[] tables,
                                                                             java.util.Collection<RowLink> rowLinks,
                                                                             uk.ac.starlink.table.JoinFixAction[] fixActs,
                                                                             uk.ac.starlink.table.ValueInfo matchScoreInfo)
        Constructs a non-random table made out of a set of possibly non-random constituent tables joined together according to a RowLink collection. Any input tables which do not have random access must have row ordering consistent with (that is, monotonically increasing for) the ordering of the links. In practice, this is only likely to be the case if all the input tables are random access except for (at most) one, and the links are ordered with reference to that one. If this requirement is not met, sequential access to the resulting table is likely to fail at some point.
        Parameters:
        tables - array of constituent tables
        rowLinks - link set defining the match
        fixActs - actions to take for deduplicating column names (array of the same size as tables)
        matchScoreInfo - may suply information about the meaning of the match scores, if present
      • makeInternalMatchTable

        public static uk.ac.starlink.table.StarTable makeInternalMatchTable​(int iTable,
                                                                            java.util.Collection<RowLink> rowLinks,
                                                                            long rowCount)
        Analyses a set of RowLinks to mark as linked rows of a given table. The result of this method is a two-column table whose rows correspond one-to-one with the rows of the table referenced in the link set. The output columns are defined by the constants GRP_ID_INFO and GRP_SIZE_INFO. Rows of the table linked together by rowLinks are assigned the same integer value in the new GRP_ID_INFO column, and the GRP_SIZE_INFO column indicates how many rows are linked together in this way. Each group corresponds to a single RowLink; if a row is part of more than one RowLink then only one of them will be recorded in the new columns. Any rows linked in rowLinks which do not refer to table have null entries in these columns.
        Parameters:
        iTable - the index of the table in which internal matches are to be sought
        rowLinks - a collection of RowLink objects linking groups of rows together
        rowCount - number of rows in the returned table (must be large enough to accommodate the indices in rowLinks)
        Returns:
        a new two-column table with a one-to-one row correspondance with the table describing internal row matches
      • findGroups

        public static java.util.Map<RowLink,​LinkGroup> findGroups​(java.util.Collection<RowLink> links)
        Returns a mapping from RowLinks to LinkGroups which describes connected groups of links in the input collection. A related group is one in which the RowRefs of its constituent RowLinks form a connected graph in which RowRefs are the nodes and RowLinks are the edges. A LinkGroup with a link count of more than one therefore represents an ambiguous match, that is one in which one or more of its RowRefs is contained in more than one RowLink in the original RowLink collection.

        The returned map contains entries only for non-trivial LinkGroups, that is ones which contain more than one link.

        Parameters:
        links - link set representing a set of matches
        Returns:
        RowLink -> LinkGroup mapping describing connected groups in links
      • orderLinks

        public static java.util.Collection<RowLink> orderLinks​(LinkSet linkSet)
        Best-efforts Conversion of a LinkSet, which is what RowMatcher outputs, to a Collection of RowLinks, which is what's used by this class. This essentially calls LinkSet.toSorted(), but in case that fails for lack of memory (not that likely, but could happen) it will write a message through the logging system and return a value giving an unordered result instead.
        Parameters:
        linkSet - unordered LinkSet
        Returns:
        input links as a collection, but if possible in natural order