So you want to go Causal Neo4j in Azure? Sure we can do that
So you might have noticed in the Azure market place you can install an HA instance of Neo4j – Awesomeballs! But what about if you want a Causal cluster?
Hello Manual Operation!
Let’s start with a clean slate, typically in Azure you’ve probably got a dashboard stuffed full of other things, which can be distracting, so let’s create a new dashboard:
Give it a natty name:
Save and you now have an empty dashboard. Onwards!
To create our cluster, we’re gonna need 3 (count ‘em) 3 machines, the bare minimum for a cluster. So let’s fire up one, I’m creating a new Windows Server 2016 Datacenter machine. NB. I could be using Linux, but today I’ve gone Windows, and I’ll probably have a play with docker on them in a subsequent post…I digress.
At the bottom of the ‘new’ window, you’ll see a ‘deployment model’ option – choose ‘Resource Manager’
Then press ‘Create’ and start to fill in the basics!
- Name: Important to remember what it is, I’ve optimistically gone with 01, allowing me to expand all the way up to 99 before I rue the day I didn’t choose 001.
- User name: Important to remember how to login!
- Resource group: I’m creating a new resource group, if you have an existing one you want to use, then go for it, but this gives me a good way to ensure all my Neo4j cluster resources are in one place.
Next, we’ve got to pick our size – I’m going with DS1_V2 (catchy) as it’s pretty much the cheapest, and well – I’m all about being cheap.
You should choose something appropriate for your needs, obvs. On to settings… which is the bulk of our workload.
I’m creating a new Virtual Network (VNet) and I’ve set the CIDR to the lowest I’m allowed to on Azure (10.0.0.0/29) which gives me 8 internal IP addresses – I only need 3, so… waste.
I’m leaving the public IP as it is, no need to change that, but I am changing the Network Security Group (NSG) as I intend on using the same one for each of my machines, and so having ‘01’ on the end (as is default) offends me
Feel free to rename your diagnostics storage stuff if you want. The choice as they say – is yours.
Once you get the ‘ticks’ you are good to go:
It even adds it to the dashboard… awesomeballs!
Whilst we wait, lets add a couple of things to the dashboard, well, one thing, the Resource group, so view the resource groups (menu down the side) and press the ellipsis on the correct Resource group and Pin to the Dashboard:
So now I have:
After what seems like a lifetime – you’ll have a machine all setup and ready to go – well done you!
Now, as it takes a little while for these machines to be provisioned, I would recommend you provision another 2 now, the important bits to remember are:
- Use the existing resource group:
- Use the same disk storage
- Use the same virtual network
- Use the same Network Security Group
BTW, if you don’t you’re only giving yourself more work, as you’ll have to move them all to the right place eventually, may as well do it in one!
Whilst they are doing their thing, let’s setup Neo4j on the first machine, so let’s connect to it, firstly click on the VM and then the ‘connect’ button
We need two things on the machine
- Neo4j Enterprise
- Java
The simplest way I’ve found (provided your interwebs is up to it) is to Copy the file on your local machine, and Right-Click Paste onto the VM desktop – and yes – I’ve found it works way better using the mouse – sorry CLI-Guy
Once there, let’s install Java:
Then extract Neo4j to a comfy location, let’s say, the ‘C’ drive, (whilst we’re here… !Whaaaaat!!???
an ‘A’ drive? I haven’t seen one of those for at least 10 years, if not longer).
Anyways – extracted and ready to roll:
UH OH
Did you get ‘failed’ deployments on those two new VMs? I did – so I went into each one and pressed ‘Start’ and that seemed to get them back up and running.
#badtimes
(That’s right – I just hashtagged in a blog post)
Anyways, we’ve now got the 3 machines up and I’m guessing you can rinse and repeat the setting up of Java and Neo4j on the other 2 machines. Now.
To configure the cluster!
We need the internal IPs of the machines, we can run ‘IpConfig’ on each machine, or just look at the V-Net on the portal and get it all in one go:
So, machine number 1… open up ‘neo4j.conf’ which you’ll find in the ‘conf’ folder of Neo4j. Ugh. Notepad – seriously – it’s 2017, couldn’t there be at least a slight improvement in notepad by now???
I’m not messing with any of the other settings, purely the clustering stuff – in real life you would probably configure it a little bit more. So I’m setting:
- dbms.mode
- CORE
- causal_clustering.initial_discovery_members
- 10.0.0.4:5000,10.0.0.5:5000;10.0.0.6:5000
I’m also uncommenting all the defaults in the ‘Causal Clustering Configuration’ section – I rarely trust defaults. I also uncomment
- dbms.connectors.default_listen_address
So it’s contactable externally. Once the other two are setup as well we’re done right?
HA No chance! Firewalls – that’s right in plural. Each machine has one – which needs to be set to accept the ports:
5000,6000,7000,7473,7474,7687
Obviously, you can choose not to do the last 3 ports and be uncontactable, or indeed choose any combo of them.
Aaaand, we need to configure the NSG:
I have 3 new ‘inbound’ rules – 7474 (browser), 7687 (bolt), 7000 – Raft.
Right. Let’s get this cluster up and contactable.
Log on to one of your VMs and fire up PowerShell (in admin mode)
First we navigate to the place we installed Neo4j (in my case c:\neo4j\neo4j-enterprise-3.1.3\bin) and then we import the Neo4j-Management module. To do this you need to have your ExecutionPolicy set appropriately. Being Lazy, I have it set to ‘Bypass’ (Set-ExecutionPolicy bypass).
Next we fire up the server in ‘console’ mode – this allows us to see what’s happening, for real situations – you’re going to install it as a service.
You’ll see the below initially:
and it will sit like that until the other servers are booted up. So I’ll leave you to go do that now…
Done?
Good – now, we need to wait a little while for them to negotiate amongst themselves, but after a short while (let’s say 30 secs or less) you should see:
Congratulations! You have a cluster!
Logon to that machine via the IP it says, and you’ll see the Neo4j Browser, login and then run
:play sysinfo
You could now run something like:
Create (:User {Name:’Your Name’})
And then browse to the other machines to see it all nicely replicated.