Property graph
A property graph, labeled property graph, or attributed graph is a data model of various graph-oriented databases,[1] where pairs of entities are associated by directed relationships, and entities and relationships can have properties.
In graph theory terms, a property graph is a directed multigraph, whose vertices represent entities and arcs represent relationships. Each arc has an identifier, a source node and a target node, and may have properties.
Properties are key-value pairs where keys are character strings and values are numbers or character strings. They are analogous to attributes in entity-attribute-value and object-oriented modeling. By contrast, in RDF graphs, "properties" is the term for the arcs. This is why a clearer name is attributed graphs, or graphs with properties.
This data model emerged in the early 2000s.
Formal definition
[edit]Building upon widely adopted definitions,[2][3] a property graph/attributed graph can be defined by a 7-tuple (N, A, K, V, α, , π), where
- N is the set of nodes /vertices of the graph
- A is the set of arcs (directed edges) of the graph
- K is a set of keys, taken from a countable set, defining the nature of attributes/properties
- V is a set of values, to be associated with these keys in order to define full-fledged attributes
- is a total function, defining the multigraph proper. For a ∈ A, u∈ N, v ∈ N, α (a) = (u, v) means that a is an arc of the graph having node u for origin and node v for target
- is a binary relation over (A∪N) and K (formally defined as a subset of the cartesian product (A∪N)×K ), associating zero, one or several keys to each arc and node of the graph
- is a partial function, providing values for the properties of the nodes and the arcs which include them. For u ∈ N, a ∈ A and k ∈ K, π (u, k) (respectively π (a, k)) is the value associated with the property key k for the node u, (respectively the arc a), if the corresponding attribute property is defined there.
A complementary construct, used in several implementations of property graphs with commercial graph databases, is that of labels, which can be associated both with nodes and arcs of the graph. Labels have a practical rather than theoretical justification, as they were originally intended for users of Entity-Relationship models and relational databases, to facilitate the import of their legacy data sets into graph databases :. labels make it possible to associate the same identifier (that of the relational table, or of the ER entity) to all graph nodes which would correspond to the different rows of this relational table, or to instances of the same generic entity / class. With the proposed definition, these labels could in fact be viewed as attributes defined only by a key, without an associated value (this is why is defined separately as a binary relation, and π as a partial function). The basic definition thus becomes much clearer, simpler, and satisfies a principle of parsimony. Alternatively, and more consistently, labels can be defined through type graphs, as special types associated with nodes and arcs.
Relations with other models
[edit]Graph theory and classical graph algorithms
[edit]Attributed graphs are especially useful and relevant in that they are an "umbrella" hypernymic concept ( i.e. a generalization) for several key graph-theoretic models, which have long been widely used in classical graph algorithms
- Labeled graphs associate labels to each vertex and/or edge of a graph. Matched with attributed graphs, these labels correspond to attributes comprising only a key, taken from a countable set (typically a character string, or an integer)
- Colored graphs, as used in classical graph coloring problems, are special cases of labeled graphs, whose labels are defined on a finite set of keys, matched to colors.
- Weighted graphs associate a numerical value to arcs/edges, and, when relevant, to the vertices of a directed or undirected graph. These weights correspond to the values of a set of attributes with the same key. For example, for a model of a road network, where each segment has a length and a capacity (number of vehicles per unit time) can be represented by an edge with two weights.
- Flow networks are weighted graphs whose weights are interpreted as a capacities. They are used in all kinds of very classical models of transport networks, used e.g. with maximum flow algorithms.
- Shortest path problems, as solved by very classical algorithms (like Dijkstra's algorithm), operate on weighted graphs for which the weights correspond to distances, real or virtual.
Knowledge graphs and RDF graphs
[edit]Knowledge graphs, usually represented in RDF, are hybrid labeled graphs, whose node labels correspond to instance identifiers (IRI)s or literals, and edge labels identify types (not instances) of predicates. They have now acquired a visibility which tends to obscure the longer-established use of graphs as direct model for systems of all kinds.[4] They are less versatile and expressive than attributed graphs.
Кnowledge graphs capture weakly structured information about a physical system. They mix structural relationships with attached properties, and category information with instances, drowning out the structure. By contrast, graphs whose connections capture the structure of a physical system can be called cyber-physical.
Also, RDF graphs can only express first order logic, while attributed graphs can express higher order logic. Represening relationship properties in, RDF requires a cumbersome reification process.
Standardization
[edit]The NGSI-LD data model specified by ETSI has been the first attempt to standardize property graphs under a de jure standards body. Compared to the basic model defined here, the NGSI-LD meta-model adds a formal definition of basic categories (entity, relation, property) on the basis of semantic webstandards (OWL, RDFS, RDF), which makes it possible to convert all data represented in NGSI-LD into RDF datasets, through JSON-LD serialization. NGSI-LD entities, relations and properties are thus defined by reference to types which can themselves be defined by reference to ontologies, thesauri, taxonomies or microdata vocabularies, for the purpose of ensuring the semantic interoperability of the corresponding information.
The ISO/IEC JTC1/SC32/WG3 group of ISO, which established the SQL standard, is in the process of specifying a new query language suitable for graph-oriented databases, called GQL (Graph Query Language). This standard will include the specification of a property graph data model, which should be along the lines of the basic model described here, possibly adding notions of labels, types, and schemas .
Type graphs and schemas
[edit]Graph-oriented databases are, compared to relational databases, touted for not requiring the prior definition of a schema to start populating the base. This is desirable and suitable for environments and applications where one operates under an open world assumption, such as the description of complex systems and systems of systems, characterized by bottom-up organization and evolution, not control of a single stakeholder. However, even in such environments, it may be needed to constrain the representation of specific subsets of the information entered into the database, in a way that may resemble a traditional database schema, while keeping the openness of the overall graph for addition of unforeseen data or configurations. For example, the description of a smart city falls under the open world assumption and will be described by the upper level of a graph database, without a schema. However, specific technical sub-systems of this city remain top-down closed-world systems managed by a single operator, who may impose a stronger structuring of information, as customarily represented by a schema.
The notions of "type graphs" and schemas[2] make it possible to meet this need, with types playing a role similar to that of labels in classical graph databases, but with the added possibility of specifying relations between these types and constraining them by keys and properties. The type graph is itself a property graph, linked by a relation of graph homomorphism with the graphs of instances that use the types it defines, playing a role similar to that of a schema in a data definition language.
The ontologies, thesauri or taxonomies used to reference NGSI-LD types are also defined by graphs, but these are RDF graphs rather than property graphs, and they typically have broader scopes than database schemas. The complementary use, possible with NGSI-LD types, of type graphs and referencing of external ontologies, makes it possible to enforce strong data structuration and consistency, while affording semantic grounding and interoperability.
References
[edit]- ^ Angles, Renzo (2012-04-01). "A comparison of current graph database models". International Conference on Data Engineering. IEEE.
- ^ a b Bonifati, Angela; Furniss, Peter; Green, Alastair; Harmer, Russ; Oshurko, Eugenia; Voigt, Hannes (2019), Laender, Alberto H. F.; Pernici, Barbara; Lim, Ee-Peng; de Oliveira, José Palazzo M. (eds.), "Schema Validation and Evolution for Graph Databases", Conceptual Modeling, vol. 11788, Cham: Springer International Publishing, pp. 448–456, arXiv:1902.06427, doi:10.1007/978-3-030-33223-5_37, ISBN 978-3-030-33222-8, retrieved 2021-09-15
- ^ Gutierrez, Claudio; Hidders, Jan; Wood, Peter T. (2018), "Graph Data Models", in Sakr, Sherif; Zomaya, Albert (eds.), Encyclopedia of Big Data Technologies, Cham: Springer International Publishing, pp. 1–6, doi:10.1007/978-3-319-63962-8_81-1, ISBN 978-3-319-63962-8, retrieved 2021-09-15
- ^ Privat, Gilles; Abbas, Abdullah “Cyber-Physical Graphs” vs. RDF graphs, W3C Workshop on Web Standardization for Graph Data, Berlin, March 2019