MetClassNet data are stored on a Neo4J graph database, which can be queried using the Cypher language. There is a plethora of tutorials for Neo4j neophytes: the official neo4j.com is a very good place to start. It is recommended to follow this guide, although, for a quick hands-on session by people familiar with working with graphs, some of the basics regarding the query language are covered here. Experienced users and people just here to have a look can directly jump to the samples queries section to see the kind of relevant information that can be extracted from our system.
- Get the list of nodes and edges types ("node labels" and "relationship type" in Neo4J speech)
CALL db.relationshipTypes()
CALL db.labels()
- show how they are connected (the the data model):
CALL db.schema.visualization()
- see how many of them are in there:
MATCH (n)
RETURN count(n)
MATCH ()-[r]-()
RETURN count(r)
- see the kind of attributes ("properties") they have
MATCH (n)
UNWIND keys(n) as k
RETURN distinct count(n),k,labels(n)
MATCH ()-[r]-()
UNWIND keys(r) as k
RETURN distinct count(r),k,type(r)
note: if the total number of nodes and relationship is large, a random sample can be extracted using MATCH (n) WHERE rand() <= 0.10(here, 10% of nodes are sampled)
- get the list of functions that can be used in cypher (the
count(),key(),type()andlabels()that we used in the previous steps can be found here):
CALL dbms.functions()
- write a Cypher query
the Cypher Query Language synthax:
() : a node
--> : an edge (a.k.a. relationship)
(n) : a node stored in a variable named n
-[e]-> : a edge stored in a variable named e
(:feature) : a node of a given class (a.k.a. label)
-[:pears]-> : a relationship of a given class (a.k.a. ltype)
(n {Formula:"C6H12O6"}) : a node with a given attribute (a.k.a. propertie)
-[e {massdifference:38}]-> : an edge with a given propertie
Main Cypher clauses:
A basic cypher query is usually structured in 3 parts:
- a pattern : a description of the kind of subgraph we're interested in.
Example: A peak in a experimental layer + its neighbors that are mapped in the knowledge layer
- an anchor : constraints that define where the pattern will be searched. (some basic property/type constraints can be defined directly in the pattern)
Example: the peak with mass xxxx in experimental layer + its neighbors that are mapped in the knowledge layer
- a format : a description of what to keep and how we want the information to be displayed.
Example: from the peak with mass xxxx in experimental layer + its neighbors that are mapped in the knowledge layer, get the list of names and masses of the neighbors.
| Clause | Description |
|---|---|
| PATTERN | |
MATCH |
Specify the patterns to search for in the database. |
OPTIONAL MATCH |
Specify the patterns to search for in the database while allowing missing parts. |
| ANCHOR | |
WHERE |
Adds constraints to the patterns in a MATCH or OPTIONAL MATCH clause. |
AND/OR/NOT |
Logical operator to combine constraints. |
| FORMAT | |
RETURN |
Defines what to include in the query result set. Can be the matching nodes and/or relationships as a subgraph, the values of some of their properties, or the results of functions applyed on them (count(), max(), sum()). |
RETURN ... AS |
Set results format by defining a column alias. |
ORDER BY [ASC/DESC] |
Follows a RETURN clause, specifying that the output should be sorted in either ascending (the default) or descending order. |
LIMIT |
Follows a RETURN clause, specifying a maximum number of results to be outputed. |
DISTINCT |
Remove remove duplicates values. |
Example: from the peak with mass xxxx in experimental layer + its neighbors that are mapped in the knowledge layer, get the list of names and masses of the neighbors.
MATCH (n1:feature)-[:mzdiff]-(n2:feature)
WHERE n1.value = xxxx
AND (n2)-[:match]-(:metabolites)
RETURN n2.mass AS neighbor_mass, n2.name AS neighbor_name
ORDER BY neighbor_mass DESC
More complex queries can be build by combining basic ones:
- nested queries : where an anchor can be definied as matching another pattern.
- chained queries : where the results of a query are passed as input to another one.
- combined queries : where the results are a combinaison of the results of multiple queries.
| Clause | Description |
|---|---|
WHERE EXISTS { ... } |
Use a nested subquery to define the constraints to the patterns. |
WITH ... [AS] |
Same as RETURN, but create chained query by piping the results from one to the next. AS can be used to store results in variables. Functions can also be used to process results before piping. |
UNION |
Combines the result of multiple queries into a single result set. Duplicates are removed. |
UNION ALL |
Combines the result of multiple queries into a single result set. Duplicates are retained. |
- get features' degree (number of neighbors)
MATCH (n1:feature)--(n2)
RETURN n1.name, count(n2) AS degree
ORDER BY degree DESC
- get the ego network (network of neighbors) of a given node
MATCH (n1:feature {name:"Cluster_0619"})-[r1]-(n2)
OPTIONAL MATCH (n2)-[r2]-(n3)--(n1)
RETURN r1,n2,r2,n3
- get pairs with mzdiff match + correlation
MATCH (n1:feature)-[diff:mzdiff]-(n2:feature)
MATCH (n1)-[corr:pears]-(n2)
RETURN n1.name, n2.name, diff.value, corr.pearson
- shows subclasses of a chemical class
MATCH p=(c:n4sch__Class {n4sch__label:"Flavonoids"})<-[*]-(c2)
RETURN p;
This query search for a path of undefined length, using a wildcard
[*]. A specific length can be set using[*3], and a range using[*1..3].
⚠ Please note that this kind of query (path search) can take a lot of time to compute if not well constrained. ⚠
- get transitions from mass difference range of value
MATCH (u:feature)-[e:mzdiff]-(v:feature)
WHERE 50 < tofloat(e.massdifference) <= 70
RETURN DISTINCT e.value
While neo4j is optimized for query involving network traversal, selecting from properties values isn't its strength, compared to traditional SQL databases.