Each month a date stamped directory is created and populated with some defaults. (alpha) Some months sets of Dipper's RDF files are post processed and generate QC graphs. (beta) Some months these beta files pass through SciGraph and become the latest Monarch release. Directory structure may include: YYYMM |-- owl/ |-- owlsim/ | |-- data/ |-- rdf/ |-- scigraph.tgz |-- scigraph-ontology.tgz |-- solr.tgz |-- translationtable/ |-- tsv/ | |-- disease_associations/ | |-- gene_associations/ | |-- genotype_associations/ | |-- model_associations/ | |-- variant_associations/ |-- visual_reduction/ |-- changes/ |-- reduced/ |-- release/ ------------------------------------------------------------------ owl/ Snapshot of the ontology files dipper RDF is integrated with. owlsim/ The cache files used to start an owltools owlsim server via owltools all.owl --use-fsim --sim-load-lcs-cache owlsim.cache --sim-load-ic-cache ic-cache.owl --start-sim-server See https://github.com/owlcollab/owltools/wiki owlsim/data/ Data files used to generate the three cache files in the parent owlsim directory. See https://github.com/monarch-initiative/monarch-owlsim-data/blob/master/server/Makefile scigraph.tgz Neo4J data dump for the Monarch SciGraph database. See https://github.com/SciGraph/SciGraph/wiki scigraph-ontology.tgz Neo4J data dump for the Monarch SciGraph ontology only database. solr.tgz Solr data dump for all Monarch cores - search, feature location, and GOlr (associations) . rdf/ Explicit statements in Resource Descriptor Format(s) along with their metadata. .ttl files are in the 'turtle' format which groups statements by common subject .nt files are in the 'ntriple' format for atomic statements. *_dataset.ttl and *_count files are metadata for their associated filename note: the largest ingests tend not to have a turtle (.ttl) representation. translationtable/ Contains snapshots of the namespace mappings and term labels used forming the RDF. tsv/ TSV formatted data dumps of specific queries of the Monarch database. See the TSV glossary of terms at the end of this file for details about each field. tsv/disease_associations/ Disease phenotype associations tsv/gene_associations/ Gene to disease and gene to phenotype associations tsv/genotype_associations/ Genotype to disease and genotype to phenotype associations tsv/model_associations/ Model to disease associations tsv/variant_associations/ Variant to disease and variant to phenotype associations visual_reduction/release/ Monochromatic representation of every namespace transition (statement type) found in each ingest. Numbers on edges represent total count for the type of statement. visual_reduction/changes/ Graphical depiction of every namespace level difference from the previous release to this release. Numbers on edges represent changes in counts for the type of statement. - blue represents newly added edges - orange represents an old edges no longer present. - black is continuing at the same volume or more - pink is continuing but at a decreased amount visual_reduction/reduced/ This set is not a complete representation of the ingest as the other two are. Here accuracy is exchanged for emphasizing greatest RELATIVE difference. There are no edge counts displayed, instead edge width scales with greatest absolute difference in counts between previous and current release. Colors are as with `visual_reduction/changes/` Furthermore the change in counts must be more than "several" to be considered significant enough to include at all, so unchanged parts of the graph fade away. Caveats when interpreting these 'reduced' graphs include - the width of an edge has nothing to do with any other graph - the width of an edge has nothing to do with the count of edges in its graph. The width of an edge only approximates its relative portion of the largest difference observed in the corresponding 'visual_reduction/changes/' graph. Empty files are included for those with no significant differences. TSV Glossary of Terms subject: The curie formatted identifier for the subject of the association (gene, disease, variant). subject_label: Label of the subject of the association. subject_taxon: Taxonomic class of the subject. This is typically a CURIE of the form NCBITaxon:nnnn. subject_taxon_label: Label of subject taxon. object: The curie formatted identifier for the object of the association (disease, phenotype). object_label: Label of the object of the association. object_taxon: Taxonomic class of the object. This is typically a CURIE of the form NCBITaxon:nnnn. object_taxon_label: Label of object taxon. relation: A relationship type that connects the subject with object. relation_label: Label for relation. evidence: Evidence type. In Monarch we may have a chain of assertions that link two entites/terms. This is a list of all evidence types used in that chain. evidence_label: Labels for each evidence code. source: The RDF sources used to create the association. is_defined_by: Associations are obtained from the source(s) listed. More than one source indicates the association may be derived from connecting data from multiple sources, or multiple sources have corroborated the same assertion. qualifier: Qualifies if the underlying query makes a direct connection or is inferred across multiple associations, eg gene to phenotype inferred across gene to disease and disease to phenotype. onset: The age group in which disease manifestations appear. onset_label: Label for the onset class. frequency: Class to represent frequency of phenotypic abnormalities within a patient cohort. frequency_label: Label for the frequency class.