Confident Clean a Hacked Site Workshop Join me March 21st for this 90 Minute Workshop  Claim your spot

Using Neo4j to give WordPress some relationship advice

WordPress

It would have been a glorious moment for Conan the Barbarian, as he trampled over the orcs, having picked up his might Long Sword +2 and with his bash skill along with his strength modifier he had +6 to the strike. With Rincewind timely casting of the Raise strength  spell it would double to +12 he was invincible ¦

Yet it was not to be, as the developer struggled to find the solution, when Conan goes to make that mighty attack, it was the server not the Orcs that died.

Clearly thought the developer what I need is a mighty Graph Database. His second thought was it was probably about time for tea.


When custom post types were introduced into WordPress in deepest darkest history it changed WordPress forever, not only did it open up WordPress as a CMS but perhaps more then anything else enforced the WordPress Way .

There is a stumbling block with custom post types and posts in general, they have no concept of relationships with each other for example if I create a post type called Companies  and another Products  I have no way to associate Products with a company out of the box. Solutions do exist like Post2Post plugin to provide an abstracted way to create and query relationships. However building complex queries based on these relationships is not simple and process consuming.

The answer, well without a massive rewrite of WordPress database is to look beyond WordPress, imagine if we were to push this data into something that could understand relationships and allow us to create complicated queries to common questions. You know questions like Which character are Spell Casters  or What skills does Jon Snow have  Those sorts of important WordPress questions oh and boring things like who should see a friends status.

Neo4J – Introducing a Graph Database Engine

Neo4J is a database that could be used as a wholesale replacement to traditional SQL databases, however rather then storing data in tables, it uses graphs. Neo4j is one of the most popular graphing database engines out there, and the community edition is GPL based. Like Elastic Search, MySQL and similar it has a dual product with the community edition and a commercial version with additional features and support package.

In Neo4j data is stored in nodes, every node can have multiple attributes, for example in the WordPress world an individual post would be it’s own node, it’s content and post meta being it’s attributes. Individual users an taxonomies would likewise be there own nodes. Nodes can be collectively grouped by labels, so all posts from a custom post type could be collected together. Nodes are interconnected through edges (in graph speak) which reflect relationships. To make selecting and manipulating nodes and relationships easier in Cypher (Neo4j querying language) pretty much everything can be labeled which act as selectors.

Let’s see a simple example of creating a user in Cypher:


CREATE (n:User { username : 'tnash', user_id : 2, email : 'tim@example.com', display_name : 'Tim Nash' })

This creates a single node, with label User (to help us collate it with other nodes for querying) and attributes username, email and display_name.

If we wanted to see all Users  (See all the nodes labeled with User) we could do something like:


MATCH(n:User)
RETURN n

Which would return all users in this case just our single one.

So let’s create a post:


CREATE (n:Post { title : 'Hello World', post_id : 1, slug : 'hello-world', content : ' Hello world! Welcome to WordPress. This is your first post. Edit or delete it, then start blogging! ' })

Again, we created a new label post and some attributes, all pretty simple, so let’s create a relationship


MATCH (a:User),(b:Post)
WHERE a.username = 'tnash' AND b.slug = 'hello-world'
CREATE (a)-[r:authored]->(b)
RETURN r

So we are looking at nodes labeled User and Post, specifically for ones with the username attribute value tnash and on the post side a slug hello world. We then create a relationship with the label authored  and indicate the user Authored the Post.

One of the nice things, with Neo4j is you can use it’s admin browser to visualise your graph for example:

graph (1)

ok so not very frilling, but you can build complicated queries up very quickly.

Let’s take a more interesting example, a partial view of a simple RPG game built in WordPress, so we have the following custom post types.

  • Characters
  • Classes
  • Abilities

Abilities also has a custom taxonomy called type, the rest have various postmeta.

In addition Characters have included in their post meta to indicate their Class (via post ID) and post meta list of abilities. These are simply built using CMB Library but could be built by hand or using Advanced Custom Fields and Repeater field.

The result when pushed into neo4j looks like this:
graph

So each Custom Post type is a label on nodes and the postmeta that contained the link to the nodes via Post ID have been turned into edges to create relationships.

So what sort of questions can we ask now, with this more fun set.

Which Characters are Spellcasters

Spellcaster is a attribute of a Class node, so to find the characters, we must match against characters and classes, find characters who have relationship with a class, and find classes with attribute spellcaster true.


MATCH(a:Character),(b:Class)
WHERE (a)--(b) AND b.spellcaster = true
RETURN a

Which results in Rincewind and Jon Snow are spell casters, as the Ranger and Mage class both can cast spells. Conan is left out being a fighter he can’t cast spells.
graph (2)
Note the pic actually shows the above query with RETURN a,b so you can see the classes as well
We could optimise this query significantly and when we return it, we would more likely want to return simply the post_id for each character, which we can then use in WordPress to pull the full character object.

Let’s go with something more complex.

What Skills does, Jon Snow have

So a Skill is a type (attribute) of Ability node, which may have a direct relationship with Jon Snow or through his class or something he owns.
In Neo4j we would write a query like this:


MATCH (a:Character {name : "Jon Snow" }),(b:Ability)
WHERE b.type = "skill" AND ( (a)--(b) OR (a)-[:is|owns]-()-[:has]-(b))
RETURN b

Which results in Speak with Animals and Tracking, the other abilities he have are not skills , but feats or other abilities. So this query, we are matching the character Jon Snow and Abilities, we then look for Abilities with the type skill and who have either a direct relationship with Jon Snow (A)-(B)or through another node which he is or owns, which has an ability (a)-[:is|owns]-()-[:has]-(b)

Neo4j & WordPress

So up to now I have been using pure Cypher for the examples, how do you get it working with WordPress? Neo4j has a REST API interface, along with a couple of dedicated PHP Libraries. Several Javascript libraries also exist, which might be more appropriate depending on circumstances. When initially choosing a PHP Library I chose Nexoygen Neoclient. However this is not essential, and calls can be made with WP_HTTP API.

So this is the point, where you are expecting a nice link to Github with a lovely wrapper, that creates nodes and relationships directly on saves, and provides a WPDB like interface for interacting with Cyper right?

Sorry to disappoint. It’s not quite that simple, as we have already mentioned WordPress has no concept of relationships between posts, all of that has to be built first. Then you have to decide what content to push, you could push all metadata and all Posts into Neo4j along with users and taxonomies or just a small subset. I tend to choose to push just the information I need to effectively query and build queries from, and then grab post objects, based on returned IDs from Neo4j

In the RPG example, it has a dozen or so post types, not all were pushed into Neo4j and of those only specific postmeta was pushed in to create attributes that might need to be queried against. So for example descriptions  and similar were not put in, nor does Neo4j have a concept of the post state as only Published posts are included.

In effect Neo4j was being used as a pure tool to graph and explore relationships and query that data, not as a mechanism to retrieve content. Consequently nearly all queries returned post_id attribute which was the WordPress postID which could then be retrieved via either the loop, or WP-API.

wordpress-stack

When describing Neo4j to those interested in WordPress circles I tend to liken it to Elastic Search, a tool that sits almost as a middle layer between the database, like Elastic Search we have to push data into Neo4j before it becomes useful and we must keep and maintain sync between Neo4j and our MySQL database.

My current setup

This is more an aside, as I do intend to put together a more generic setup and set of classes, but for those interested in my current projects I have the following setup:

  • A MU plugin, which is nothing more then a wrapper to load neoclient and grab config out of wp-config.php
  • A plugin, that is acting as sync, with neo4j, hooked into post and user update actions, and making changes as needed. Each post type is registered  with the plugin with what labels, attributes and what relationships to pass.
  • A plugin that extends WP-API to provide access proxy cypher queries, with option to return the WordPress post object or objects instead of simply the returned attribute. It also provides object caching.

Custom Post Types post meta is available through the admin interface through HumanMade CMB Library, as a byproduct relationships are also stored as post meta though it is not going to be used in WordPress. While this is bloating the MySQL DB a bit the convenience of using CMB outweighs for me this bloat.

This approach is a bit hacky and I’m in a process of refining it, replacing neoclient out to use the WP-API and combined it to extend WP-API as well as returning PHP objects.

The emphasis on WP-API is simply as both projects I have been using Neo4j on have heavily relied on the WP-API and Javascript. In theory I could have called Neo4j directly but then I would have to make further queries if I wanted any additional data not in the Neo4j. While still needing to generate those objects, by proxying through the API it reduced the number of calls down.

So why not rewrite WordPress using Neo4j?

After all db.php is pluggable, surely it would make more sense to rebuild WordPress to use graphing systems? Well there are problems, while it’s possible to move database engines from MySQL to another engine for example PostreSQL it’s not trivial and possible because they are both SQL databases. So when a SQL query is run against them they know how to handle it. Now while Neo4j has tools to help convert a SQL database, it doesn’t have an abstraction layer for handling SQL queries, so you would need to rewrite every SQL query in core and any plugins, or build an abstraction layer. That would not be fun, and on the next core update it would be even less fun. What’s more even once we done this, we are back to the problem, WordPress doesn’t really do relationships between post types, so we would still be handling these manually.

When to use a Graph Engine

So the RPG example was fun, but what practical use would relationships investigation be in slightly more complex WordPress sites, well here is the sort of things you could do and maybe have tried to do in a WordPress app.

  • If you have the post types of Brands and Products which has a taxonomy includes the category Children’s toy, you could find all Brands that sell Children toys.
  • Create related posts queries based on not just the posts current tag, but looking at tags other posts also tagged with the current tag use.
  • Buddypress or similar sites, use it to determine who see’s what status feed updates
  • Run a membership site, see what posts a user has access to
  • Find best selling products or items to bundle in e-commerce site
  • Find association between objects be it people, movies, websites or indeed any object

Basically imagine any query you have written with more then a single join, or where you have used multiple loops and chances are a graphing database could have simplified the query. While not the salvation of all messy queries by any stretch using a graph based solution can help solves some headaches.

Where to go next with Neo4j and WordPress

This was meant as an introduction to using Neo4j with WordPress, while I said that I didn’t think it would be possible to build a generic drop in for Neo4j I do think some common libraries could be built and using my current setup I have started to develop a wpneo4j Class for interacting with Neo4j databases, as well as looking at ways to introduce graph concepts in a WordPress  way for example allowing you to register a Custom Post type as a label.

In the mean time there is nothing to stop you getting started right now, using the neoclient PHP library or one of the javascript libraries.

To get Neo4J up and running you have two choices either set it up in your stack or usehttp://www.graphenedb.com/ which is a free hosted service.

Running in your own stack, Neo4j can be downloaded from http://neo4j.com/download/ selecting the Community Edition, it’s a Java application so should run pretty much anywhere but does require a relatively recent SDK version (aka default Mac pretend one, won’t do) once downloaded, it’s a case of unpacking and then to run cd into it’s bin folder and


. neo4j start

neo4j-interface

If you have left everything as default, you can access the interface at http://localhost:7474/ which will prompt you to create a user. From there the default screen provides a few examples to follow and you can type in Cypher queries and visualise the response. With user connection details you can then connect using the neoclient and start exploring.

This has been a really quick intro to Graph databases and using them alongside WordPress more traditional SQL database. Next time we will go into some code examples and start diving into some more complicated queries across multiple post types and taxonomies.

So at this stage, you think I’m mad, you are completely lost or you are still wondering what’s the point. Hopefully a few of you are thinking, this is going to change EVERYTHING! Which ever state it is, why not let me know in the comments.

Helping you and your customers stay safe


WordPress Security Consulting Services

Power Hour Consulting

Want to get expert advice on your site's security? Whether you're dealing with a hacked site or looking to future-proof your security, Tim will provide personalised guidance and answer any questions you may have. A power hour call is an ideal starting place for a project or a way to break deadlocks in complex problems.

Learn more

Site Reviews

Want to feel confident about your site's security and performance? A website review from Tim has got you covered. Using a powerful combination of automated and manual testing to analyse your site for any potential vulnerabilities or performance issues. With a comprehensive report and, importantly, recommendations for each action required.

Learn more

Code Reviews

Is your plugin or theme code secure and performing at its best? Tim provides a comprehensive code review, that combine the power of manual and automated testing, as well as a line-by-line analysis of your code base. With actionable insights, to help you optimise your code's security and performance.

Learn more

Or let's chat about your security?

Book a FREE 20 minute call with me to see how you can improve your WordPress Security.

(No Strings Attached, honest!)