Using PowerBI with Neo4j

There’s an excellent post by Cédric Charlier over at his blog about hooking Neo4j into PowerBI. It’s simple to follow and get’s you up and running, but I (as a PowerBI newbie) had a couple of spots where I ran into trouble – generally with assumptions I think that are made assuming that you know how to navigate around the PowerBI interface. (I didn’t).

So, here is a simple tutorial to get us non-BI people up and running!

The Setup Steps

First – we’ve got to install PowerBI – now, I didn’t sign up for an account, but downloaded it from the PowerBI website, and installing was simple and quick.

We also need to have Neo4j running, and you can use Community or Enterprise, it matters not – and we’ll want to put the ‘Movies’ dataset in there, so run your instance, and execute:

:play Movies

 

Now we’re ready to ‘BI’!

Step 1 – Start Power BI Desktop

This is pretty obvious, but in case you need it – click on the ‘Power BI Desktop’ link in your start menu – or double click on it if you went and put it on the Desktop. Crazy days.

Step 2 – Click on ‘Get Data’

image

That way we can get data!

Step 3 – Select ‘Blank Query’

Why not ‘web’ you ask? Well as we’re going to do some copy/pasting – it’s easier from a blank query point of view.

image

Step 4 – Advanced

In the query editor window that pops up, select ‘Advanced Editor’

image

Step 5 – Get Data!

We’re going to use the same query as Cédric as you can then use this post to augment his, so in the query editor simply paste:

let
    Source = Web.Contents( "http://localhost:7474/db/data/transaction/commit",
             [
                 Content=Text.ToBinary("{
                          ""statements"" : [ {
                          ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
             }")]
             )
in
    Source

Oh noes! The same error as Cédric got – authentication. You can’t send the login details via changing the URL to be something like:

http://user:pass@localhost….

as that also fails, but you can send in the auth as a header, by adding this line:

Headers = [#"Authorization" = "Basic bmVvNGo6bmVv"],

What is this bmVvNGo6bmVv? Well, that’s the base64 encoded user/pass combo – which is a bit uh oh as you have to generate this 🙁

I’ve got two options here – LinqPad and Powershell

LinqPad

Using this bit of C# – obviously – you can write your own C# app in VS or whatever, but typically I use LinqPad for quick scripts.

var username = "neo4j";
var password = "neo";

var encoded = Encoding.ASCII.GetBytes(string.Format("{0}:{1}", username, password));
var base64 = Convert.ToBase64String(encoded);

base64.Dump();

 

Powershell

This does pretty much the same, but can obviously be run in a Powershell prompt – which is nice!

 

Param(
    [string]$username,
    [string]$password
)

$encoder = [system.Text.Encoding]::UTF8
$token = $username + ":" + $password
$encoded = $encoder.GetBytes($token)

$base64 = [System.Convert]::ToBase64String($encoded)
Write-Output $base64

which is then used like:

GetAuthCode.ps1 –username neo4j –password neo

So, with this information, our new ‘Get data’ bit looks like:

let
    Source = Web.Contents( "http://localhost:7474/db/data/transaction/commit",
             [
                 Headers = [#"Authorization" = "Basic bmVvNGo6bmVv"],
                 Content=Text.ToBinary("{
                          ""statements"" : [ {
                          ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
             }")]
             )
in
    Source

which when we ‘preview’ gives us this:

image

Step 6 – Read as Json

Select the ‘localhost’ file and then choose ‘open as Json’ from the top menu:

image

You’ll notice once you’ve done this – your ‘Source’ has changed to now be ‘Json.Document(Web.Contents…)’

image

Step 7 – Navigation

First click on the ‘List’ of ‘Results.

This will take you to a screen that looks like this:

image

Note, you now have another ‘Step’ in the right hand bar – by the way – if you ever ’lose’ the Settings side bar – click on ‘View’ at the top and select ‘Query Settings’ to bring it back.

Then click on the ‘Record’ link, and then the ‘List’ for data:

image

Worth noting here, we’re still in the ‘Navigation’ step

Now you should have a list of ‘Record’s –

image

Step 8 – Table-ify

Go ahead and press the ‘To Table’ button, and then just ‘OK’ on the dialog that pops up:

image

Step 9 – Expand the Column

Records aren’t useful to Power BI (apparently) so – we need to expand that column out and to do that we click on the ‘Expand’ button – and in our case – we only want the ‘row’, not the meta, so unselect the ‘meta’ and press OK

image

Now you should see a row of ‘List’ and an extra step in our ‘Applied Steps’ list:

image

Step 10 – Add a custom column

So now we need to get the information out of these new ‘Lists’ – and to do that we need a custom column, so click on the ‘Custom Column’ button in the ‘Add Column’ tab:

image

In the dialog that pops up we want to have it say:

= Record.FromList([Column1.row], type[Name = text, Rank = number])

image

Then press OK, and you’ll have another Column called ‘Custom’, and another item in our Applied Steps:

image

Step 11 – Expand Custom

More records eh? Let’s expand it out, so as before, click on the ‘Expand’ button:

image

and in this case, we want all the columns:

image

Now you should have two new columns, and another step added:

image

Data! Yay!

Step 12 – Remove that non-useful row

Right click on the ‘Column1.row’ column and select Remove

image

Step 13 – Close & Apply

Now we have data in a format we can use in Power BI, let’s close and apply that query.

image

Step 14 – Use that data

Now – I’m no Power BI user – so this is super simple and pointless, but should get you going for experimenting.

After applying that query we’re back in the main desktop view, but now in the right hand side – we have some fields with our Query there:

image

Let’s VISUALIZE

I’m going to pick a ‘Treemap’ – because.

image

Empty treemap – Check!

image

Let’s set some data, I want to group by ‘Rank’, so I drag ‘Custom.Rank’ to the ‘Group’ section which is in the ‘Visualizations’ bar:

image

And then for ‘Values’ I’m going to drag the ‘Custom.Name’ field

image

Oooooh – colours:

image

Let’s expand our visualization by pressing the ‘Focus Mode’ button:

image

Boom! Full size

Now, if I hover over one of those boxes I get the brief info displayed:

image

Ace, only 2 names with a rank of 5, and to see who they are, right click and select ‘See Records’

image

And here they are:

image

No More Steps

If you want to just copy/paste the code, you can! Create a new blank query and open up the advanced editor and just paste the code below in. (NB There are probably loads of things which are rubbish about this implementation, lemme know!)

let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                Headers=[Authorization="Basic bmVvNGo6bmVv"],
                Content=Text.ToBinary("{""statements"" : [ {
                        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
                        }")
            ])),
    results = Source[results],
    results1 = results{0},
    data = results1[data],
    #"Converted to Table" = Table.FromList(data, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"row"}, {"Column1.row"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Column1", "Custom", each Record.FromList([Column1.row], type[Name = text, Rank = number])),
    #"Expanded Custom" = Table.ExpandRecordColumn(#"Added Custom", "Custom", {"Name", "Rank"}, {"Custom.Name", "Custom.Rank"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"Column1.row"})
in
    #"Removed Columns"

 

Writing a Stored Proc in Neo4j for .NET Developers

I’m a .NET developer and I have been for about 13 years or so now, predominantly in C#, but I originally (in my university days) started off programming in Java. Now, I’ve not touched Java for roughly 13 years, and I’m pretty happy with that situation.

3 years(ish) ago I started using Neo4j – as you might notice from previous blog posts. The ‘j’ does indeed stand for Java and I had the feeling that some day – some dark day – I would have to flex those Java muscles again.

Turns out they had atrophy-ed to nothing.

Bum.

I’m guessing that I’m not the only .NET dev out there who fears the ‘j’ – so here’s a quick get-up-and-go guide to writing a Neo4j Stored Procedure.

We’re going to write a stored procedure to return a movie and all it’s actors from the example ‘movies’ database you can add to your instance by running ‘:play movies’ in the Neo4j Console.

The Tutorial

We’re going to go through the steps to create a super simple stored proc that just get’s the actors for a given movie.

Step 1. Install the JDK

I never thought I’d have to write that. Ho Hum, so I google for ‘Java SDK’ and pick the ‘Java SE Development Kit 8’ link (top one for me). Then download the appropriate SJDK for your environment, x64 or x86. Then install that bad boy. Oooh, Java is used on X Billion devices good to know!

Step 2. Install Gradle

Just go to https://gradle.org/install and follow the instructions – if you have ‘scoop’ or ‘chocolatey’ installed, then you can use them, I went manual install.

Side note 1 – WTF is Gradle?

Gradle is a build-automation system for Java, I guess a bit like MSBuild – with Nuget built in. As we’ll see a bit later on, you add the dependencies which are then pulled from Maven (the Java Nuget (I think)) – most of the posts you see online referring to how to create stored procs for Neo4j will use a Maven setup, but APOC (a large community driven set of stored procs) uses Gradle, and I reckon if they’re using Gradle, it’s probably better than Maven. Or it’s newer and shinier. Either way – I’m going Gradle.

Step 3. Choose your IDE – IntelliJ

To be honest, if you go with anything other than IntelliJ IDEA – you may as well stop reading – as this is all written from an IntelliJ point of view. I’m using the Ultimate edition, but I have no doubt this will be pretty much the same on the Community (free!) version.

Download and install.

Step 4. Start IntelliJ

image

It does have a nicer splash screen than Visual Studio – and JetBrains write Resharper – so hopefully the changeover isn’t as jarring (ha!) as it could be.

Step 5. New Project!

image

But what new project?

image

What we’re going to go for is a ‘Gradle’ project, choosing Java and using the 1.8 SDK:

image

Step 6. GroupId and ArtifactId

Pressing ‘next’ gets us to a window allowing us to set the groupId and artifactId of our project.

Step 7…

Wait what? GroupId? ArtifactId? What on earth are they??? Shouldn’t there just be ‘Name’?

OK, you can think of these as kind of like a namespace and a dll (jar) name.

GroupId – a name to uniquely identify your project across all projects. Typically (it seems) this usually follows the convention of ‘org.<companyName>.<projectName>’ so, (going all MS) I might have: ‘org.contoso.movies’.

ArtifactId – Basically the name of the JAR file that is created, minus any versioning information. Lowercase only folks – cos it’s Java, and I guess to optimise keyboard usage they opted to shun the shift key.

image

As you can see, I’ve got a company name of ‘cskardon’ and a JAR name of ‘movies-procs’. I’ve left the Version as it was. Just because. Hit Next, next.

Step 7. More settings!

Don’t worry we’re nearly there,

I turned on ‘Auto-import’ and ‘Create directories for empty content roots automatically’. I’m using the default gradle wrapper – this basically (as far as I know) puts a copy of gradle into your folder so you can run ‘gradle.bat’ from the command prompt and have it do all the things. Either way, it does mean you don’t have to install gradle if you’re just using the code.

You will need to make sure the Gradle JVM is set to ‘1.8’ (see the picture below) it won’t work with the JAVA_HOME option.

image

Step 8. Locations!

Finally! Locations! Name wise – we’ll stick with what we selected for the artifactId in step 6, this makes life easier – and location wise – go for wherever you like – it’s your computer after all.

image

Note, we now have a ‘finish’ button – no more ‘next’ HUZZAH!

Step 9. Expand and config files

First off, let’s expand the ‘movies-procs’ node:

image

Now, double click on the ‘build.gradle’ file. We need to add some things here to get access to libraries for Neo4j. First up is a ‘project.ext’ element:

project.ext {
    neo4jVersion = "3.2.0"
}

This needs to be below the sourceCompatibility element, and above the repositories element. Speaking of which, we need some more repositories, so set the repositories element to:

repositories {
mavenLocal()
maven { url "https://m2.neo4j.org/content/repositories/snapshots" }
mavenCentral()
maven { url "http://oss.sonatype.org/content/repositories/snapshots/" }
}

Now we need to change the dependencies so we can use all the goodies.

dependencies {
compile group: 'commons-codec', name: 'commons-codec', version:'1.9'

compile 'com.jayway.jsonpath:json-path:2.2.0'

compileOnly group: 'net.biville.florent', name: 'neo4j-sproc-compiler', version:'1.2'

testCompile group: 'junit', name: 'junit', version:'4.12'

testCompile group: 'org.hamcrest', name: 'hamcrest-library', version:'1.3'

testCompile group: 'org.apache.derby', name: 'derby', version:'10.12.1.1'

testCompile group: 'org.neo4j', name: 'neo4j-enterprise', version:neo4jVersion

testCompile group: 'org.neo4j', name: 'neo4j-kernel', version:neo4jVersion, classifier: "tests"
testCompile group: 'org.neo4j', name: 'neo4j-io', version:neo4jVersion, classifier: "tests"
compileOnly(group: 'org.neo4j', name: 'neo4j', version:neo4jVersion)
compileOnly(group: 'org.neo4j', name: 'neo4j-enterprise', version:neo4jVersion)

compileOnly(group: 'org.codehaus.jackson', name: 'jackson-mapper-asl', version:'1.9.7')
testCompile(group: 'org.codehaus.jackson', name: 'jackson-mapper-asl', version:'1.9.7')

compileOnly(group: 'org.ow2.asm', name: 'asm', version:'5.0.2')

compile group: 'com.github.javafaker', name: 'javafaker', version:'0.10'

compile group: 'org.apache.commons', name: 'commons-math3', version: '3.6.1'
}

By the way – I’d like to point out I have largely got this from the APOC library, so it’s probably bringing in too much, and is probably overkill, but later on when you need something obscure, it’s probably already there. So… Win!

Step 10. Package 1

10 steps to get to programming, but on the plus side – each stored proc you add to this project doesn’t need the setup, and it’s a one-off for each project. Anyhews.

So we’re going to add a package, which is a namespace. In this case we’re going to add one called ‘common’:

Expand the ‘src/main’ folders – and right click on the ‘java’ folder, then add –> new –> Package

image

Now you have the package there:

image

Step 11. A class

Now time to add a Java file – called ‘MapResult’ – this is entirely taken from APOC.

image

Type is in this case a class:

image

Highlight everything in the class that is created, and paste the below into it:

package common;

import java.util.Collections;

import java.util.Map;

public class MapResult {

private static final MapResult EMPTY = new MapResult(Collections.<String, Object>emptyMap());
public final Map<String, Object> value;

public static MapResult empty() {

return EMPTY;
}

public MapResult(Map<String, Object> value) {

this.value = value;
}
}

This allows us to map this result of our query.

Step 12. Package 2

OK, now we’re going to add another package to the ‘java’ folder, this time called ‘movie’, and a class within that called ‘ActorProcedures’ – not necessarily the best named class :/

image

Step 13. Code!

I’m just going to ask you to paste the below into your code window, and we’ll go over it in a minute or two:

package movie;

import common.MapResult;
import org.neo4j.procedure.Context;
import org.neo4j.procedure.Name;
import org.neo4j.procedure.Procedure;

import java.util.Collection;
import java.util.HashMap;
import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.Stream;

import static java.lang.String.format;
import static java.lang.String.join;

public class ActorProcedures {
@Context
public org.neo4j.graphdb.GraphDatabaseService _db;

public static String withParamMapping(String fragment, Collection keys) {
if (keys.isEmpty()) return fragment;
String declaration = " WITH " + join(", ", keys.stream().map(s -&gt; format(" {`%s`} as `%s` ", s, s)).collect(Collectors.toList()));
return declaration + fragment;
}

@Procedure
public Stream getActors(@Name("title") String title) {

Map&lt;String, Object&gt; params = new HashMap&lt;String, Object&gt;();
params.put("titleParam", title);
return _db.execute(withParamMapping("MATCH (m:Movie)&lt;-[:ACTED_IN]-(a:Person) WHERE m.title = {titleParam} RETURN a", params.keySet()), params).stream().map(MapResult::new);
}
}

Step 14. Build

Yes – I know we’ve eschewed tests, we’ll come to those later, for now we just want to do the standard ‘build’ – because we’re using gradle. we’re going use IntelliJ to help us run the build, first go to ‘View’ then ‘Tool Windows’ and select ‘Gradle’

image

In the window the pops up, expand the ‘Tasks’ and then ‘build’ collapsed elements, and double click on ‘build’:

image

You should get a ‘Run’ window popping up at the bottom of the screen, looking a bit like this:

image

You should also now have a ‘build’ folder in your project window, with a ‘libs’ folder inside, and hopefully inside that – the .jar file

image

Step 15. Manual Testing

I’m going to cover unit testing the procedure in another post, to try to limit the size of this one, but obviously now we have a .jar, we want to put that into our DB.

Right-click on the .jar and select ‘Show in Explorer’

image

Copy that .JAR file and place into the ‘plugins’ directory of your version of Neo4j. Now, if you’ve used the ‘zip’ version – it’s just in the root already there, and if you’re using the installer version – you’ll need to create a ‘plugins’ folder in the location listed in the application:

image

So copy the location and open it in explorer:

image

New folder called plugins.

Now paste the .Jar file into the plugins folder and stop (if you need to) Neo4j, then start it again to load it.

Go to the neo4j browser and login if you need to.

We’re now going to run ‘call dbms.procedures’

image

We get a list of the procs in the DB, so far so good – now it’s time to scroll on down the list…

image

Awesomeballs!

Now, let’s call that bad boy. BTW – I’m assuming you have the movies DB installed – if not, run

:play movies

now and get it all there. Done? Good.

To call our proc, we run:

call movie.getActors(“Top Gun”)

image

Which gets us results:

image

Now, that seems tested and working. But we probably want to start getting some unit tests in there asap, so I’ll cover that next.

So you want to go Causal Neo4j in Azure? Sure we can do that

So you might have noticed in the Azure market place you can install an HA instance of Neo4j – Awesomeballs! But what about if you want a Causal cluster?

image

Hello Manual Operation!

Let’s start with a clean slate, typically in Azure you’ve probably got a dashboard stuffed full of other things, which can be distracting, so let’s create a new dashboard:

image

Give it a natty name:

image

Save and you now have an empty dashboard. Onwards!

To create our cluster, we’re gonna need 3 (count ‘em) 3 machines, the bare minimum for a cluster. So let’s fire up one, I’m creating a new Windows Server 2016 Datacenter machine. NB. I could be using Linux, but today I’ve gone Windows, and I’ll probably have a play with docker on them in a subsequent post…I digress.

image

At the bottom of the ‘new’ window, you’ll see a ‘deployment model’ option – choose ‘Resource Manager’

image

Then press ‘Create’ and start to fill in the basics!

image

  • Name: Important to remember what it is, I’ve optimistically gone with 01, allowing me to expand all the way up to 99 before I rue the day I didn’t choose 001.
  • User name: Important to remember how to login!
  • Resource group: I’m creating a new resource group, if you have an existing one you want to use, then go for it, but this gives me a good way to ensure all my Neo4j cluster resources are in one place.

Next, we’ve got to pick our size – I’m going with DS1_V2 (catchy) as it’s pretty much the cheapest, and well – I’m all about being cheap.

image

You should choose something appropriate for your needs, obvs. On to settings… which is the bulk of our workload.

image

I’m creating a new Virtual Network (VNet) and I’ve set the CIDR to the lowest I’m allowed to on Azure (10.0.0.0/29) which gives me 8 internal IP addresses – I only need 3, so… waste.

image

I’m leaving the public IP as it is, no need to change that, but I am changing the Network Security Group (NSG) as I intend on using the same one for each of my machines, and so having ‘01’ on the end (as is default) offends me Smile

image

Feel free to rename your diagnostics storage stuff if you want. The choice as they say – is yours.

Once you get the ‘ticks’ you are good to go:

image

It even adds it to the dashboard… awesomeballs!

image

Whilst we wait, lets add a couple of things to the dashboard, well, one thing, the Resource group, so view the resource groups (menu down the side) and press the ellipsis on the correct Resource group and Pin to the Dashboard:

image

So now I have:

image

After what seems like a lifetime – you’ll have a machine all setup and ready to go – well done you!

image

Now, as it takes a little while for these machines to be provisioned, I would recommend you provision another 2 now, the important bits to remember are:

  • Use the existing resource group:
    image
  • Use the same disk storage
  • Use the same virtual network
  • Use the same Network Security Group
    image

BTW, if you don’t you’re only giving yourself more work, as you’ll have to move them all to the right place eventually, may as well do it in one!

Whilst they are doing their thing, let’s setup Neo4j on the first machine, so let’s connect to it, firstly click on the VM and then the ‘connect’ button

image

We need two things on the machine

  1. Neo4j Enterprise
  2. Java

The simplest way I’ve found (provided your interwebs is up to it) is to Copy the file on your local machine, and Right-Click Paste onto the VM desktop – and yes – I’ve found it works way better using the mouse – sorry CLI-Guy

Once there, let’s install Java:

image

Then extract Neo4j to a comfy location, let’s say, the ‘C’ drive, (whilst we’re here… !Whaaaaat!!??? image 

an ‘A’ drive? I haven’t seen one of those for at least 10 years, if not longer).

Anyways – extracted and ready to roll:

image

UH OH

image

Did you get ‘failed’ deployments on those two new VMs? I did – so I went into each one and pressed ‘Start’ and that seemed to get them back up and running.

#badtimes

(That’s right – I just hashtagged in a blog post)

Anyways, we’ve now got the 3 machines up and I’m guessing you can rinse and repeat the setting up of Java and Neo4j on the other 2 machines. Now.

To configure the cluster!

We need the internal IPs of the machines, we can run ‘IpConfig’ on each machine, or just look at the V-Net on the portal and get it all in one go:

image

So, machine number 1… open up ‘neo4j.conf’ which you’ll find in the ‘conf’ folder of Neo4j. Ugh. Notepad – seriously – it’s 2017, couldn’t there be at least a slight  improvement in notepad by now???

I’m not messing with any of the other settings, purely the clustering stuff – in real life you would probably configure it a little bit more. So I’m setting:

  • dbms.mode
    • CORE
  • causal_clustering.initial_discovery_members
    • 10.0.0.4:5000,10.0.0.5:5000;10.0.0.6:5000

I’m also uncommenting all the defaults in the ‘Causal Clustering Configuration’ section – I rarely trust defaults. I also uncomment

  • dbms.connectors.default_listen_address

So it’s contactable externally. Once the other two are setup as well we’re done right?

HA No chance! Firewalls – that’s right in plural. Each machine has one – which needs to be set to accept the ports:

5000,6000,7000,7473,7474,7687

image

Obviously, you can choose not to do the last 3 ports and be uncontactable, or indeed choose any combo of them.

Aaaand, we need to configure the NSG:

image

I have 3 new ‘inbound’ rules – 7474 (browser), 7687 (bolt), 7000 – Raft.

Right. Let’s get this cluster up and contactable.

Log on to one of your VMs and fire up PowerShell (in admin mode)

image

First we navigate to the place we installed Neo4j (in my case c:\neo4j\neo4j-enterprise-3.1.3\bin) and then we import the Neo4j-Management module. To do this you need to have your ExecutionPolicy set appropriately. Being Lazy, I have it set to ‘Bypass’ (Set-ExecutionPolicy bypass).

Next we fire up the server in ‘console’ mode – this allows us to see what’s happening, for real situations – you’re going to install it as a service.

You’ll see the below initially:

image

and it will sit like that until the other servers are booted up. So I’ll leave you to go do that now…

Done?

Good – now, we need to wait a little while for them to negotiate amongst themselves, but after a short while (let’s say 30 secs or less) you should see:

image

Congratulations! You have a cluster!

Logon to that machine via the IP it says, and you’ll see the Neo4j Browser, login and then run

:play sysinfo

image

You could now run something like:

Create (:User {Name:’Your Name’})

And then browse to the other machines to see it all nicely replicated.

The three stages of *my* database choice

Originally posted on: http://geekswithblogs.net/cskardon/archive/2015/12/03/the-three-stages-of-my-database-choice.aspx

Prologue

I write and run a website called Tournr, a site to help people run competitions, it helps them organise and register for competitions, keeping the scores online and taking some of the pain of competition management out for them. I began it due to a badly run competition I attended (and ahem ran) a few years ago – and I wanted to make it better for first-timers (like myself) and old-hands alike. This post is about the database decisions and pains I’ve been through to get where I currently am, it’s long, and the TL;DR; I changed my DB.

Chapter 1 – SQL Server

I’m a .NET developer through and through – no bad thing, but it does tend to lead you in a certain train of thought – namely the Microsoft Stack, (WISA – Windows, IIS, SQL Server, Asp NET). Personally, I don’t have the time, well, more the inclination to learn a new language when I’m comfortable in .NET and it does what I want it to do – so I created my first version.

tournr

I also was predominantly a desktop developer, this was my first real foray into the world of web development, so the styling, colour choices in a word – sucked. More importantly, the backend was progressing slowly. At the early stages of any project, changes occur rapidly some ideas which seem great begin to lose their shine after a week, or when someone else hears them and says ‘no, just no’.

So Tournr was based on SQL Server using the Entity Framework as it’s ORM – again – standard practice. I started to get got fed up with writing migration scripts. I’m more your Swiss Army Knife developer, good at a lot of things, but not a super-specialized-amazeballs at one thing in particular – a generalist if you will – and I found the time spent migrating my database structure, writing SQL etc was delaying me from actually writing features. I know people who can reel out SQL easily and are super comfortable with it, and I’m ok, I can write queries for creating/deleting/joining etc, but not as efficiently as others.

Chapter 2 – RavenDB

Skip along 6 months, and I’d been playing with RavenDB at my workplace, and thought it looked like it might be a good fit for Tournr. So I took a month or so to convert Tournr to use Raven instead of SQL Server, and man alive! that was one of my best ever decisions, I felt freer in terms of development than I had for ages, instead of working out how my classes would fit, and whether I needed yet another lookup table, I could write my classes and just Save. Literally. A little note here: Raven has first class .NET integration, it is very easy to use.

I procrastinated for a while after the initial conversion and finally got Tournr released using RavenHQ for hosting the DB and life was good – including a new Logo.

Print

I could add new features relatively easily. Over time I found myself adding things into my class structures to make the queries simpler, and ended up doing a little bit of redundancy. As an example I would have a list of Competitors in a Class (not a code class, but a competition class – like Junior or Women’s for example), and if a competitor was registered in two Classes, they would in essence be copied into both, so my Tournament would have 2 Classes with the same Competitor in both. I won’t bore with details, but this encroachment started to happen a little bit more.

Brief interlude

I’m aware that anytime you write something about how you struggled with <technology>, the developers and users who love it and are passionate about it will think you’re:

a) doing it wrong
b) don’t understand the `<technology>`
c) vindictive because something went wrong
d) insert your own reason here!

It’s natural, people make decisions which they get invested in, and they want their decisions to be positively reinforced, if you read something saying ‘Oh I left <technology> because it was <insert your own negative phrase here>’. It’s like they’ve slapped you and said you’ve made the wrong choice.

So to those people. It was just that way for me, it’s not a personal attack on you -or- Raven, or indeed SQL Server.

I was talking with my partner about a new feature I wanted to add in, and as we talked about it, the structure started to become apparent, she drew a circle and lines going into it. I made the glib statement somewhere along the lines of “the problem is that what you’ve drawn there is basically a graph, it’s a bit more complex than that”. To which she responded “Why don’t you use the graph db?”.

I had no good answer. I’d been using Neo4j for a good few years so it’s not like I didn’t get it. Obviously it’s a big decision, switching from 1 DB to another is never a small thing, let alone from one type (document) to another (graph). Sure – I’d done it in the past from Relational to Document, but at that point *no-one* was using it, so it only affected me. This time I’d have users and Tournaments.

Now, Tournr isn’t used by many people at the moment, this is a blessing and a curse – the curse being that I’d love it to be used by more people 🙂 The blessing is that I can monitor it very closely and keep tabs on how the conversion has gone. Hooking in things like RayGun means that getting near instant notification of any error combined with quick code turn-around I can respond very quickly.

Long and short of it. I thought ‘<expletive> it!’, and set to work…..

Before jumping there, lets look at the positives and negatives of using Raven,

Positives:
  • Extremely fast to get up and running (I think it’s fair to say without Raven Tournr would not have been launched when it was)
  • Fits into C# / .NET code very well
Negatives:
  • You really need to buy into Ayende’s view of how to use the Database, this isn’t a bad thing in itself, but it does restrict your own designs.

 

Chapter 3 – Neo4j

At the point you take the plunge it’s important to get a quick win, even if (as it turns out) it’s superficial and full of lies and more LIES! I’m going to give a bit of an overview of Tournr’s structure, not going super deep – you don’t need to know that. Tournr was initially an ASP.NET MVC3 application, which was migrated to MVC5, along the way it stuck with the ASP.NET Membership system using first the Entity Framework version, and then a custom rolled RavenDB based version.

Whilst doing this conversion the *only* thing I allowed myself to do aside from the DB change was update the Membership to use ASP.NET Identity – and that was for two reasons –

1. There wasn’t a Neo4j based Membership codebase that I could see – so I’d have had to roll my own, and
2. There is a Neo4j Identity implementation (which I *have* helped roll).

Membership

Long story short – I added the Neo4j.Aspnet.Identity nuget package to my web project and switched out the Raven Membership stuff, this involved adding some identity code, setting up OWIN and other such-ness. The real surprise was that this worked. No problems at all – this was the quick win. I thought to myself – this is something that is not impossible.

Conversion – The rest

What? Membership and ‘The rest’ – it’s not exactly partioning the post is it Chris? Well – no, and the reason is this – when I switched the membership – it compiled, started and let me login, register etc. Obviously I couldn’t load any tournaments, or rather I could, but I couldn’t tie the user accounts to them. When I switched the pulling of Tournaments etc all bets were off.

I like to go cold turkey. I removed the RavenDB nuget package from the project and winced at the hundreds of red squiggles and build errors. All that could be done from this point was a methodical step by step process of going through controllers replacing calls to Raven with calls to my new DB access classes. Anyhews, that aside – I ended up with an interface with a bucket load of methods.

Model 1

Woah there! You’re thinking – I think you missed a step there, what about the data model design – yes – you’re of course right. Prior to my conversion I had drawn out a model we’ll call this Model 1. This was (as you can probably guess from the name) wrong. But that didn’t stop me, and that’s partly down to my personality – if I’m not doing something – I find it easy to get bored and then spend time reading the interwebs. Also – I know I’m going to find out some stuff that will change the model, no point in being too rigid to it.

In this model – I’d seperated out a lot of things into individual nodes, for example – a User has a set of properties which are grouped in a class together representing Personal Registration details – things like country flag etc, and I had the model:

(User)-[:HAS_PERSONAL_DETAILS]->(PersonalDetails)

So I wrote a chunk of code around that.

Something you will find is that Neo4j doesn’t store complex types – simple arrays of simple types are cool, Dictionaries and objects are out. So you can quite easily separate out into individual nodes like above, and first cut – well – that’s the route I took.
So I plugged away, until I hit some of the bigger classes, this is where Raven had given me an easy run – Oh hey! You want to store all those nested classes? NO PROBLEM! That is awesomely powerful – and gives super super fast development times. Neo4j not so forgiving. So, taking ‘Model 1’ as the basis I start to pick out the complex objects. Then EPIPHANY

Model 2 – The epiphany

In my view, for complex types which really are part of a Tournament or indeed a User, and in particular things I wasn’t going to search by, why create a new Node? Trade off – bigger nodes, but less of them – queries (or cyphers) become a bit simpler, but can’t query as easily against the complex embedded types.

Maybe I needed an inbetween – where some complex types *were* nodes, and some were just serialized with the main object. Weird. A _middle ground_, can you have that in development?

So Model 2 takes Model 1 and combines some of the types which really didn’t need to be separate nodes. So Personal Details moved into the User, as I had no need to query on the data in there (and if I _do_ need to at a later date, well – I can add it then).

Special note for .NET devs – if you try to put a CREATE into Neo4j with a type with a complex type for a property – Neo4j will b0rk at you. To get around this – you’ll need to provide a custom Json Converter to the Neo4jClient (obvs if you’re not using Neo4jClient this is totall irrelavent to you). There are examples of this on StackOverflow – and I imagine I’ll write some more on it later – probably try to update the Neo4jClient Wiki as well!

Now, so far I imagine there are Top-Devs (TM)(R) slapping their foreheads over the general lack of planning, well hold onto your pants, let’s enter the heady world of TESTING.

I know what TDD is, I know what BDD is, I’m pretty certain I know what DDD is – but for Tournr I don’t really practice them. A few reasons – and I don’t really want to get into some sort of standards war here, but in a nutshell – Tournr wouldn’t be live if I’d tested the heck out of it. In the areas that matter – the important calculations etc, I have tests, but for some things – I just don’t. Quick note for potential hirers:  I do write tests professionally, use NCrunch etc, but this is very much a personal project and I take all the heat for it, and it’s a risk I’m willing to take at the moment.

So, from Tournrs once I’d been through the controllers and got it all compiling, I started testing my codebase. Funny thing – when you write a lot of code which for the majority of time *doesn’t compile*, issues do creep in. Mostly (in this case) it was related to complex types I’d missed or the missing of a closing brace in the Cypher.

>> Cypher

I’m not going to go into this very deeply either, but Cypher is amazeballs, think of it as the SQL of the Graph DB world (well, Neo4j world – actually not anymore – you can tell this post has been in the drafts for a while – check out OpenCypher), it’s clear concise and yes – like SQL you can go wrong. You might think that you don’t want to learn Yet Another Language when you know SQL – so why not use something like OrientDB – but think about it from another way. You use SQL to interact with Relational DB, with tables, foreign keys etc. You perform Joins between tables – to use that in a GraphDB would be quite a mental leap – and confuses matters – you end up having the same keyword meaning different things for different databases – you could end up writing a ‘select’ statement in your code against both DB types. With Cypher the language is tailored to the DB, and as such describes your queries from a Node / Relationship point of view, not a Tables point of view.

The changes I mainly did involved adding attributes like ‘JsonIgnore’ to my classes to prevent Neo4j serializing them (or attempting to), partly as it meant I could get development up and running faster, but also from the point of view of Migration. One of the problems with the conversion (indeed *any* conversion) is keeping existing elements, and that means translation. Raven stores documents key’d by the type – so if I store a ‘Tournament’, it is stored as a Tournament. When I query – I bring back a Tournament. Ah, but I’ve just JsonIgnored my properties – so when I bring back – it’s missing things.

Migration

Obviously – I have elements in database A and I want them in database B, how do we achieve those goals? Bearing in mind – I don’t want them to change their passwords or not be able to login. Luckily I store passwords as plain text — HA! Not really, in practical terms, I have changed the way the passwords are hashed by switching to the Identity model, and as a consequence – there is nothing I can do :/ Existing users have to reset their passwords – now – this is BAD. How do you write an email like that? ‘Hi, I decided unilaterally to change the backend – now you need to reset your password – sucks to be you!’ – of course not. A more diplomatic approach is needed – specifically, the migration should only take place in the quietest possible period – once again a bonus of the ‘not used much’ scenario I find myself in.

All the other migration requirements are relatively simple, of course I have to split out bits that need to be split out, create relationships etc, but none of that affects the users.

The biggest headache I thought would be getting stuff from Raven and then putting into Neo4j. Take a Tournament for example, in it, I had a List of Class, which in the Neo4j world is now represented as (Tournament)-[:HAS_CLASS]->(Class) so in the codebase for the Neo4j version, I removed the ‘Classes’ property. But now I can’t deserialize from Raven, as Tournament no longer has Classes.

This is where judicious use of Source Control (which we’re *all* using right?????) comes into play. Obviously at this point I’ve done a shed load of checkins – on a different branch – ready for the big ol’ merge, so it’s relatively easy to browse the last checkin before the branch and copy the Tournament class from there.

If I just whack in the class, the compiler will throw a wobbly, not to mention the Neo4j and Raven code will be unsure of which Tournament I mean.

So, let’s rename to ‘RavenTournament’ (cleeeever), but coming back to the point made a while ago – Raven can’t deserialize into RavenTournament as it’s looking for Tournament, oh but wait. It can. Of course it can, simply as well. The standard query from Raven’s point of view would be:

session.Query<Tournament>()

to get all the Tournaments. If I switch to:

session.Query<RavenTournament>()

it will b0rk, but, if I add:

session.Query<Tournament>().ProjectFromIndexFieldsInto<RavenTournament>()

I hit the mother load, property wise RavenTournament is the same as Tournament was pre-Neo4j changes, and Raven can now deserialize.

A little word about IDs

By default Raven uses ids in the format: <type>/long, so a Tournament might be: Tournament/201. You can (and I did) change this so for example, I used ‘-‘ as the splitter: Tournament-201, and actually for Tournament – I just used a long. I can’t really change the IDs, or rather – I don’t want to, doing so means that existing links to tournaments are made invalid, of course I could add some sort of mapping code, but that seems like more effort that I shouldn’t need to do. So, Tatham to the rescue (this is Tatham Oddie of Neo4jClient fame) with SnowMaker – an Azure Storage based ID generator. I won’t go into the how’s your fathers about how I’ve used it – it’s a pretty simple concept that you can look up and enjoy. Needless to save it’s made the conversion work.

Epilogue – Post Conversion Analysis

So, codewise am I in a better shape with the conversion – was it worth it? I think so – but thinking isn’t the same as knowing – so let’s fire up some analysis with the excellent NDepend. First we’ve got to define the baseline, and in this case we’re going to set that as the last RavenDB based version of the codebase (comparing to the SQL version would be pointless as too much has changed inbetween), and then define the codebase to compare to – well that’s the most recent Neo4j based version (actually it’s a version I’m currently working on – so includes some new features not in the Raven one – so think a bit more of ‘Raven Version+’.

The first and most basic stats come from the initial dashboard –

image

Here I can see general metrics, and things like LOC, Complexity have gone down – generally – a good thing, but the number of types has increased a lot.

Less code but more types? Query-wise with Raven you can just pull out the same objects as you insert – with Neo4j, I’ve found myself using more intermediaries, which is fine, and is part of the process – in fairness as time has gone on, I’ve realised a few of these are not used as much as I thought – and I can group them better – so if I was stopping dev now, and just maintaining – I’d expect the number to drop, in practical terms – does it matter? Probably not – a few lines of code here and there – might make it more maintainable – but it’s worth thinking about the ramifications of switching to a rarely used DB (Raven or Neo4j) – I could have a 50% drop in code size, but the technology still requires more of a leap to get used to than a basic relational DB implementation.

What about new dependencies? What have I added, what have I removed?

One of the great things about NDepend (among many) is the ability to write CQL (Code Query Language) – a ‘LINQ’ style querying language for your code base – so Third Party types used that weren’t before:

from t in ThirdParty.Types where t.IsUsedRecently()
select new {
t,
t.Methods,
t.Fields,
t.TypesUsingMe
}

gives us:

image

And types which were used before and now aren’t:

from t in codeBase.OlderVersion().Types where t.IsNotUsedAnymore()
select new {
t,
t.Methods,
t.Fields,
TypesThatUsedMe = t.TypesUsingMe
}

image

There are more metrics and NDepend has a lot of things to look at, and I’m wary of making this post overly long, and I neglected to set my baseline properly to show the trend charts (bad me), ongoing though I’m keeping track of my quality to ensure it doesn’t take a dive. (by the by – Patrick of NDepend has given me a copy of NDepend, you should know that, it is genuinely useful though – but do you know if I’m saying that or am I a suck up lacky???)

 

Things I’ve Learned

  1. First and foremost is that taking on a conversion project of something you have built from scratch is totally doable – it’s hard, and quite frankly dispiriting to have your codebase not compile for a couple of days, and then spend days fixing up the stuff that’s b0rked.
  2. You can spend a long time thinking about doing something – sometimes you just have to do it – and it’s ok to be wrong.
  3. Don’t be afraid to throw stuff away, if no-one is using something – delete it, if your model is wrong, redo.
  4. Fiddler is your friend with Neo4j in Async mode,
  5. I used short cuts – the migration was a one off – the code is a quick console app that does it in a dodgy but successful way that had ZERO tests. That’s right ZERO.

Neo4jClient now supports JsonProperty for Sending Cypher

Originally posted on: http://geekswithblogs.net/cskardon/archive/2015/09/17/neo4jclient-now-supports-jsonproperty-for-sending-cypher.aspx

So, you want a demo graph, and you fire up Neo4j, go to the admin (http://localhost:7474/) and then run

:play movies

This gets you a graph to play around with, and so you head into Visual Studio (or maybe even LinqPad), and you have an urge now to get all the movies in the DB released after the year 2000. A compulsion if you will. So you create your class to get the movies:

class Movie
{
public string Title { get; set; }
public int Released { get; set; }
}

And run the query you’ve been dreaming of since 5 minutes ago:

var movies = graphClient.Cypher
.Match("(m:Movie)")
.Where((Movie m) => m.Released > 2000)
.Return(m => m.As<Movie>())
.Results;

No results? A moment of realisation dawns – j. There’s a ‘j’. At the end of Neo4, they put a ‘j’ – does this, does this mean LOWER CASE PROPERTIES!!!??? You try it just to see – lowercasing your Movie properties – it works. Ah man.

Fear not traveller! Json.Net to the rescue (and Neo4jClient 1.1.0.9 or greater)!

We must adorn our classes with Json Properties, like so:

class Movie
{
[JsonProperty("title")]
public string Title { get; set; }
[JsonProperty("released")]
public int Released { get; set; }
}

Now, we can run our dream query and discover all the movies released post 2000.

Obviously you can use / abuse the JsonProperty attribute as you wish – fancy storing things in the database with a property called ‘Foo’ but retrieving it as ‘Bar’, no problem (except with you)

[JsonProperty("Foo")]
public string Bar { get; set; }

One thing you should note fellow .Net-iens – Neo4j has no problem with UpperCamelCase properties, this is mainly for use if you’re up against the demo code, or indeed you want to change things in your code, but leave the DB ‘as-is’

Anyhews – that’s all in Nuget now, so that’s good.

Neo4jClient 1.1.0.1

Originally posted on: http://geekswithblogs.net/cskardon/archive/2015/08/07/neo4jclient-1.1.0.1.aspx

Big milestone this one, Neo4jClient now supports Transactions, Authentication and some other little changes.

Transaction info is all here: https://github.com/Readify/Neo4jClient/wiki/Transactions

You can find the connecting stuff here (at the bottom):

https://github.com/Readify/Neo4jClient/wiki/connecting

And the other big(ish) change is the change to make the ‘CollectAs’ method return the class type specified (<T>) instead of Node<T>.

Big thanks for this go to Arturo Sevilla (https://github.com/arturosevilla) for the original Pull request that has been merged in.

There’ll be more info in a while. But for now, download the new version via nuget: https://www.nuget.org/packages/Neo4jClient and have graphy fun!