Neo4MetClassNet/tutorial.md at main · MetClassNet/Neo4MetClassNet

Exploitation of MetClassNet data

MetClassNet data are stored on a Neo4J graph database, which can be queried using the Cypher language. There is a plethora of tutorials for Neo4j neophytes: the official neo4j.com is a very good place to start. It is recommended to follow this guide, although, for a quick hands-on session by people familiar with working with graphs, some of the basics regarding the query language are covered here. Experienced users and people just here to have a look can directly jump to the samples queries section to see the kind of relevant information that can be extracted from our system.

How to start : What's in there?

Get the list of nodes and edges types ("node labels" and "relationship type" in Neo4J speech)

CALL db.relationshipTypes()

CALL db.labels()

show how they are connected (the the data model):

CALL db.schema.visualization()

see how many of them are in there:

MATCH (n)
RETURN count(n)

MATCH ()-[r]-()
RETURN count(r)

see the kind of attributes ("properties") they have

MATCH (n)
UNWIND keys(n) as k
RETURN distinct count(n),k,labels(n)

MATCH ()-[r]-()
UNWIND keys(r) as k
RETURN distinct count(r),k,type(r)

note: if the total number of nodes and relationship is large, a random sample can be extracted using MATCH (n) WHERE rand() <= 0.10(here, 10% of nodes are sampled)

How to start: What can we do?

get the list of functions that can be used in cypher (the count(), key(), type() and labels() that we used in the previous steps can be found here):

CALL dbms.functions()

write a Cypher query

the Cypher Query Language synthax:

() : a node
--> : an edge (a.k.a. relationship)

(n) : a node stored in a variable named n
-[e]-> : a edge stored in a variable named e

(:feature) : a node of a given class (a.k.a. label)
-[:pears]-> : a relationship of a given class (a.k.a. ltype)

(n {Formula:"C6H12O6"}) : a node with a given attribute (a.k.a. propertie)
-[e {massdifference:38}]-> : an edge with a given propertie

Main Cypher clauses:

A basic cypher query is usually structured in 3 parts:
- a pattern : a description of the kind of subgraph we're interested in.
Example: A peak in a experimental layer + its neighbors that are mapped in the knowledge layer
- an anchor : constraints that define where the pattern will be searched. (some basic property/type constraints can be defined directly in the pattern)
Example: the peak with mass xxxx in experimental layer + its neighbors that are mapped in the knowledge layer
- a format : a description of what to keep and how we want the information to be displayed.
Example: from the peak with mass xxxx in experimental layer + its neighbors that are mapped in the knowledge layer, get the list of names and masses of the neighbors.

Clause	Description
PATTERN
`MATCH`	Specify the patterns to search for in the database.
`OPTIONAL MATCH`	Specify the patterns to search for in the database while allowing missing parts.
ANCHOR
`WHERE`	Adds constraints to the patterns in a `MATCH` or `OPTIONAL MATCH` clause.
`AND`/`OR`/`NOT`	Logical operator to combine constraints.
FORMAT
`RETURN`	Defines what to include in the query result set. Can be the matching nodes and/or relationships as a subgraph, the values of some of their properties, or the results of functions applyed on them (`count()`, `max()`, `sum()`).
`RETURN ... AS`	Set results format by defining a column alias.
`ORDER BY [ASC/DESC]`	Follows a `RETURN` clause, specifying that the output should be sorted in either ascending (the default) or descending order.
`LIMIT`	Follows a `RETURN` clause, specifying a maximum number of results to be outputed.
`DISTINCT`	Remove remove duplicates values.

Example: from the peak with mass xxxx in experimental layer + its neighbors that are mapped in the knowledge layer, get the list of names and masses of the neighbors.

MATCH (n1:feature)-[:mzdiff]-(n2:feature)
WHERE n1.value = xxxx
AND (n2)-[:match]-(:metabolites)
RETURN n2.mass AS neighbor_mass, n2.name AS neighbor_name
ORDER BY neighbor_mass DESC

Going further: performing complex cypher queries

More complex queries can be build by combining basic ones:
- nested queries : where an anchor can be definied as matching another pattern.
- chained queries : where the results of a query are passed as input to another one.
- combined queries : where the results are a combinaison of the results of multiple queries.

Clause	Description
`WHERE EXISTS { ... }`	Use a nested subquery to define the constraints to the patterns.
`WITH ... [AS]`	Same as `RETURN`, but create chained query by piping the results from one to the next. `AS` can be used to store results in variables. Functions can also be used to process results before piping.
`UNION`	Combines the result of multiple queries into a single result set. Duplicates are removed.
`UNION ALL`	Combines the result of multiple queries into a single result set. Duplicates are retained.

In practice: Sample queries

get features' degree (number of neighbors)

MATCH (n1:feature)--(n2) 
RETURN n1.name, count(n2) AS degree 
ORDER BY degree DESC

get the ego network (network of neighbors) of a given node

MATCH (n1:feature {name:"Cluster_0619"})-[r1]-(n2) 
OPTIONAL MATCH (n2)-[r2]-(n3)--(n1) 
RETURN r1,n2,r2,n3

get pairs with mzdiff match + correlation

MATCH (n1:feature)-[diff:mzdiff]-(n2:feature)
MATCH (n1)-[corr:pears]-(n2)
RETURN n1.name, n2.name, diff.value, corr.pearson

shows subclasses of a chemical class

MATCH p=(c:n4sch__Class {n4sch__label:"Flavonoids"})<-[*]-(c2) 
RETURN p;

This query search for a path of undefined length, using a wildcard [*]. A specific length can be set using [*3], and a range using [*1..3] .
⚠ Please note that this kind of query (path search) can take a lot of time to compute if not well constrained. ⚠

get transitions from mass difference range of value

MATCH (u:feature)-[e:mzdiff]-(v:feature) 
WHERE 50 < tofloat(e.massdifference) <= 70 
RETURN DISTINCT e.value

While neo4j is optimized for query involving network traversal, selecting from properties values isn't its strength, compared to traditional SQL databases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploitation of MetClassNet data

How to start : What's in there?

How to start: What can we do?

Going further: performing complex cypher queries

In practice: Sample queries

FilesExpand file tree

tutorial.md

Latest commit

History

tutorial.md

File metadata and controls

Exploitation of MetClassNet data

How to start : What's in there?

How to start: What can we do?

Going further: performing complex cypher queries

In practice: Sample queries