Neo4j 4.0 is around the corner and I want to use SSIS with it. Can I?

Yes!

That’s the TL;DR; out of the way πŸ™‚ The slightly longer version is that you need to use the 1.4.0.0 version of the Neo4j SSIS Components.

This is the main change to the components for 1.4.0.0 – there are a couple of things tidied up and hopefully a smoother experience with the syntax highlighting in the Cypher editor screens.

If you want to get your hands on the components – please visit: http://bit.ly/neo4jssis register and you’ll be sent out the download link. Registration is only used to let you know of updates to the tools, no marketing!

Neo4j – SSIS – Connection Manager Love

I’ve not written about the SSIS components for a bit, but I have been working on them – adding things, improving things. One area that was always very basic was the Connection interface. Or lack thereof – indeed the interface was basically just the properties pane of Visual Studio – mmmm Functional.

A quick reminder

Now though, you can double click on the Connection Manager and you’ll get a better interface – one that makes it easier to read!

I’ve also added two new properties to help with connections, IPv6 and Use Encryption, which hopefully are pretty obvious as to what they mean.

This is a short post as there’s not much to add, those of you on the mailing list will have found out about these updates last year, and if you want to get your hands on the toolset – please visit:Β http://bit.ly/neo4jssisΒ register and you’ll be sent out the download link. Registration is only used to let you know of updates to the tools, no marketing!

Neo4j & NiFi – Getting NiFi Running

Neo4j and Apache NiFi

Another day, another ETL tool, this time Apache NiFi which is described as:

An easy to use, powerful, and reliable system to process and distribute data.

I’ve used SSIS and Kettle in the past, so I figured I’d be able to get this bad boy running easy enough – I mean – it’s ‘easy to use’ right? That’s not a descriptive sentence often used to describe SSIS or Kettle, so should be safe.

Unfortunately – as with politics, bare faced lies are apparently acceptable in the software world as well. I’d probably replace ‘easy to use’ with ‘usable’ – as to the other keywords – powerful / reliable – I can’t confirm or deny the truth of these.

Obviously – my experience is very very small – so take it with a pinch of salt – what I will say is that the documentation is OK, but doesn’t really describe just what the ____ is going on.

Complaints over – let’s crack on.

Important information

This is using NiFi 1.10.0, there are no graph bundles for lower versions – and in this version they are not packaged by default.

Running NiFi

I tried 4 ways to run NiFi, initially – I went with running it off of my local machine – which – yes – runs Windows 10. But I also have Java installed, so saved me doing any extra leg work.

Windows 10 – Local

I downloaded the zip, unzipped to a folder – ran bin\run-nifi.bat – it started, showed me a lot of Java output – but that’s cool – I’m used to that. I connected to the URI – (http://localhost:8080/nifi) and – good news! It’s there.

Stop it (CTRL+C), restart – and it never ever succeeds in starting again. For why? I have no idea. I extracted it into a fresh directory and ran it – no dice, I rebooted, no dice, nothing I could do would sort it out – I had no instances of Java running ,netstat said the ports were not in use. Nada.

Windows Server 2019 – VM

On second thoughts – and with much frustration from the Local attempts – I decide that maybe, just maybe running this away from my work machine would make more sense, I have a Server 2019 ISO lying around – so – why not.

Long story shorter – same deal – though without the initial success – OK – so the obvious link here is Windows, let’s go Ubuntu.

Ubuntu 18.04 VM

Install. Run. Fail.

There’s a pattern. I followed the instructions to the letter – but no joy. At this point there are 2 options, but I wanted to be able to use my laptop later – so had to put the hammer back and go with Docker.

Docker for Windows Desktop

Back on the main work machine O/S – I install docker – wait what? You didn’t have Docker installed already? Well – no – I’d used it a while ago – but found it messed up my machine way too much. But hey – after 3 fails, it’s time for a win – and I need a win.

Crack open PowerShell and run:

docker run --name nifi-run <code>
-p 8080:8080</code><br>
-d `<br>
apache/nifi:latest

Open up Chrome and go to the NiFi homepage (with the expectation of failure) – and… we have a winner!

Having been caught out by this before – I stop and start it again, and it’s still working! MAGIC.

Adding the Graph Bundles

Now we have NiFi actually running, we need to get the Graph Bundles in place, and firstly, we need to download those, so go get them from the releases page of GitHub.

Download

You only need the first 3 for connecting to Neo4j, and why would you want to connect to another GraphDB eh? Oh yeah – Sadomasochism – forgot.

Stick these files into something simple to remember, in my case, I went with D:\Docker as we’ll need to reference them for the Docker container to be made.

Also – I’m adding an ‘Import’ volume to the Docker container – to allow me to pass data into NiFi – my initial intention was (and in many ways still is) to be able to read a CSV file from this folder – and insert that into Neo4j.

Create the Docker Container

Pretty much the same command as before, only this time I’m adding 4 -v parameters to the call. 3 of them are putting the .nar files (downloaded above) into the container, the last is the ‘Import’ folder.

docker run --name nifi `
-p 8080:8080 `
-v D:/Docker/nifi-graph-client-service-api-nar-1.10.0.nar:/opt/nifi/nifi-1.10.0/lib/nifi-graph-client-service-api-nar-1.10.0.nar `
-v D:/Docker/nifi-graph-nar-1.10.0.nar:/opt/nifi/nifi-1.10.0/lib/nifi-graph-nar-1.10.0.nar `
-v D:/Docker/nifi-neo4j-cypher-service-nar-1.10.0.nar:/opt/nifi/nifi-1.10.0/lib/nifi-neo4j-cypher-service-nar-1.10.0.nar `
-v D:/Docker/Import:/opt/nifi/nifi-1.10.0/data-in `
-d `
apache/nifi:latest

Aaaand,

Boom!

This seems like a good place to pause – we have NiFi running with the Graph bundles there, next time we’ll execute some queries against it.

Or. try.

Reactive Neo4j using .NET

Version 4.0 of Neo4j is being actively worked on, and aside from the new things in the database itself, the drivers get an update as well – and one of the big updates is the addition of a Reactive way to develop against the DB.

Now – I’ve not done reactive programming for a long time, I think I did play around with it when .NET 4 was first released, but I have no idea where that blog post has gone – so I may as well start as new.

I found it! Not the post, but the application – MousePath – which is now on GitHub: MousePath – aside from it ‘working’ it’s not performant in any way.

What is Rx/Reactive?

Reactive in .NET is all about the IObservable<T>/IObserver<T> interfaces. They’ve been around since .NET 4, but personally I’ve never really used them. They allow application code to react to data being pushed to it, rather than the more traditional way of requesting the data.

There’s a good book (Intro to Rx) which I will been using to work this out, which is freely available online: http://introtorx.com/ .

Starting off

For this project, we’re going to need the nuget package – which in this case isn’t Neo4j.Driver – but Neo4j.Driver.Reactive. When we add this to our project – and create a driver in the normal way- we can see we now have an ‘RxSession‘ which is an extension method of the IDriver.

So let’s create a reactive session and see what we can see.

RxSession

We get IObservable as opposed to the AsyncSession giving us Tasks

AsyncSession

Doing a Run-ner

So, back to our RxSession, lets do a basic version, just using Run

var session = Driver.RxSession();
var rxStatementResult = session.Run("MATCH (m:Movie) RETURN m.title");
rxStatementResult
    .Records()
    .Subscribe(
        record => Console.WriteLine("Got: " + record.Values["m.title"].As<string>())
    );

In here, we’re hooking up to the ol’ classic Movies database, and simply writing the titles to the screen. NB – Driver is a static property of type IDriver I have defined elsewhere.

The first two lines look pretty much like our normal code – the only real difference being the use of the ‘RxSession‘ as opposed to just ‘Session‘.

Run on an RxSession returns an IRxStatementResult – which has 3 methods we’re interested in, (well actually only 1 at the moment) – Records(), Consume() and Keys().

Records() gets us the records from the database, so the stuff we want to do things with, Consume() whips through those records so we can get an IResultSummary telling us what is going on, and Keys() gets us the keys that are returned, in the simple statement I’ve done – ‘m.title‘ is the only key.

Records() is what we’re using, as we want to deal with the data, Records() return us an IObservable<IRecord> and being IObservable – we need to Subscribe() to it to get the data. Subscribing means we will provide an IObserver that will be notified whenever an IRecord arrives.

In this case, we have the contents being written to the console. Aces.

Quitting

Being a console app – doing tiny amounts of work – I largely don’t need to worry about disposing of my resources, but let’s imagine resource usage is something we do care about. How do you go about disposing of your resources?

IDisposable? INosable! – the IRxSession doesn’t implement IDisposable, instead we have to Close<T>() it – and this is where things have got a little fuzzy for me – I’m not entirely sure I’m closing it correctly.

var session = Driver.RxSession();
var rxStatementResult = session.Run("MATCH (m:Movie) RETURN m.title");
rxStatementResult
    .Records()
    .Subscribe(
        record => Console.WriteLine("Got: " + record.Values["m.title"].As<string>()));

session.Close<IRecord>();

Now, I expect to get either no results, or a smaller subset (depending on the speed of the code running) – what I get is the close being called, but still getting the full amount of data – I suspect I misunderstand what is going on here.

Let’s say I do want a smaller subset – or to quit – how do I do it? Well, the Subscribe() method actually returns an IObservable<IRecord> – which is also IDisposable – so we ‘unsubscribe’ by disposing of our subscriber:

var session = Driver.RxSession();
var rxStatementResult = session.Run("MATCH (m:Movie) RETURN m.title");
var subscription = rxStatementResult
    .Records()
    .Subscribe(record => Console.WriteLine("Got: " + record.Values["m.title"].As<string>()));

await Task.Delay(220);
subscription.Dispose();
session.Close<IRecord>();

The ‘delay’ magic number there is enough time to get some records, but not all, anything less than that gave no results, anything more – all the results. Β―\_(ツ)_/Β―

Ooook So – Why Rx?

It seems more complex right? Subscribe(), unsubscribe – no foreach in sight! What’s the point?

My understanding – and this could be/probably is wrong – is that by using Rx – we’re reducing our overheads – i.e. instead of streaming everything, we can just stream what we’re consuming at the time.

The other key benefits come from things like ‘.Buffer‘ and the other commands (Skip, Last, etc) allowing you to stream things in a better way.

One nice thing about Rx in .NET is that it’s not the same as async – you don’t have to have your entire stack in Rx to get the benefits – you can do bits and pieces where you need to – if you’ve got a lot of data maybe it makes sense for a given query.

Execute Cypher Task Updates

Last week, Anabranch released version 1.1 of the tools for Neo4j – which included a very welcome addition to the toolset – being able to pull data from a Neo4j instance.

After doing the demo post – I noticed a peculiarity – a quirk if you will with how the ‘Execute Cypher Task’ (see here) worked with multiple Neo4j Connection Managers defined – it would execute the Cypher against all the Connection Managers, not any specific one. This makes sense in some form – as a Control Flow Task doesn’t have a ‘Connection’ selector unlike a Data Flow Task.

Version 1.2 of the tools fixes this and tidies up the Execute Cypher Task to have a better user interface as well.

Adding Cypher

The Cypher box still has highlighting, but now lines up to the edges properly.

Choosing A Connection

You can choose from a drop down which connection you want to use. You will only see Neo4j Connection Managers here. The name of the Connection Manager will be the one you set it to.

Validation

You will get a red cross on your task if you’re missing things (in this case the connection). If you look in the ‘Error List’ you will be able to see all the errors:

Getting 1.2

To get version 1.2, please visit: http://bit.ly/neo4jssis register and you’ll be sent out the download link. Registration is only used to let you know of updates to the tools, no marketing!

Using a Data Flow to move data **from** Neo4j in SSIS

That’s right everyone! We’re going from Neo4j this time, and this is a new release, the old version (1.0.0.0) didn’t have a ‘Neo4j as a Source’ component, 1.1.0.0 does.

In the last post we took data from a file and ingested it into Neo4j, so far so good – but one of the things we were missing was the ability to also pull from Neo4j, now the circle is complete, and in this post – I’m going to show you how to pull from one Neo4j instance into another. That’s right – Neo4j to Neo4j!

As always – the video below shows the moving version of this post – but not everyone wants that.

The Setup

OK, more complex than normal, as we need multiple instances of Neo4j running – and whilst that’s not rocket science – it is more complex than normal. I don’t want to go into it particularly – but hey! I would run one DB in Neo4j Desktop, then download one of the server editions (for this Community will be just fine!)

Ports!

You need to change the ports on your new server, as the Desktop ones will be using 7474/7687 etc – So open up the neo4j.conf file and change the following settings:

These are the ports I’m using – but go crazy and pick whatever you want – it’s your database after all. Aaaanyhews – I’m going to assume you know how to start your server version of the database. If not – there’s loads of stuff online about how to do it – and if it becomes clear that we’re in a world of pain here – I’ll write one πŸ™‚

Clear the DBs

WARNING!!! – which I don’t think we need – but here you go – make sure you know which DB you are doing this on! Don’t delete your production DB by mistake!!! (β€’_β€’)

On both the DBs we’re going to clear them, and add the ‘Movies’ demo set to one of them so – open up your browser window to both instances (http://localhost:7474 and http://localhost:7676 in my case) and execute:

MATCH (n) DETACH DELETE n

Then in one of the databases (and I will be using my 7474 database) execute:

:play movies

And step to the second step and put the movie data into your database. You can check the data is all there by running:

MATCH (n) RETURN COUNT(n)

You should get 171 nodes. OK, now we’re all set up and ready to go!

Let’s SSIS

As another assumption – I’m going with the fact that you know how to start up Visual Studio and create a new Package.

Let’s first add one connection:

And, obvs pick Neo4j:

Oooh – note the ‘version’ there as well – if yours says lower than that – then bad times πŸ™

Rename it to something like ‘The Source’ or whatever you find memorable:

Make sure the user / pass and server are all correct:

Looking good! Now – repeat – for the other server – remembering the port will be different – and choosing a different name, something like ‘The Destination’ for example, and you should end up with this state of affairs:

Let’s add a ‘Data Flow’ to our package now, again you can rename if you want. I did, but don’t let that force you into doing anything:

Double click on it, and we’re into Data Flow design heaven!

Add the Source

Drag the ‘Execute Cypher Source’ component from the toolbox onto the page:

Double click on it to enter the ‘Edit’ page:

The Cypher we’re going to execute is:

MATCH (m:Movie) RETURN m.title AS title

Now – some TOP TIPS. This works best if you RETURN specific columns, SSIS doesn’t know what to do with a full node, and using the AS there makes the output columns easier to use.

Once you’ve got the Cypher – you need to select the Connection to use (see the picture) – which is why naming them nicely is SUPER useful.

Once you’ve done that, hit ‘Refresh’ to get the Output Columns populated:

Job done. Good work!

Add the Destination

No surprises for guessing this involves dragging the Destination to the page.

Next, join up the Source to the Destination:

The UI for this is not as fully fledged out as the other, so unfortunately we need to head into the Advanced Editor. So Right click on it, and open the Advanced Editor:

First we want to set the connection:

Again – naming!!

Then we’re going to go to the ‘Input’ tab and select our input from the Source:

Press OK to save all that, and then double click on the Destination item and go to the Cypher Editor:

First off – you can see the ‘title’ listed in the parameters, so that’s good – Cypher wise we’re doing a MERGE- so we only get one ‘Cloud Atlas’ (because no-one needs more than one of those).

MERGE (:Movie {title: $title})

At this point, we have our two things and no red crosses or errors anywhere, so let’s run it!

Run it!

No surprises – we press ‘Start’ and get the ‘liney’ version of the page which hopefully you see as:

38 rows (hahaha Rows!) and if you go to your ‘Destination’ database you should see the movies there.

I want it

Of course you do – these controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Neo4j & SSIS – Connecting and executing Cypher in a Control Flow

Last Friday, Anabranch released the first beta version of it’s connector to Neo4j from SSIS. Aside from a post saying that it existed, I didn’t go into detail, so this is going to be a series of posts on how you can use your existing SSIS infrastructure with Neo4j.

Today we’re going to look at 2 parts of the connector, the Neo4j Connection Manager (CM) and the Execute Cypher Task (ECT). The CM is fundamental to all the controls, without it, you can’t connect to the database. I’ll go into what it does, settings etc in another post, but for now – it’s enough to know that it provides the connection. The ECT allows us to execute Cypher against a given connection manager.

** NOTE **
In version 1.0.0(beta) – the ECT will only work with the first CM you add to the package

This video covers the same topic as the text version below:

I’m going to develop this in Visual Studio 2017, at the time of writing – I found the 2019 SSIS packages to be a bit flakey, whereas the 2017 has been sturdy so far – from a ‘demo’ point of view though – the 2019 process is exactly the same after you have it all installed.

SETUP
If you’ve never developed against SSIS before, you’ll need a couple of things, firstly SSDT (specifically the Integration services bits), Visual Studio – I think the community edition should work, but I can’t confirm. You’ll also need the Anabranch Ssis Controls for Neo4j – assuming you’ve registered ( http://anabranch.co.uk/Projects/Neo4jSsis) and have the download link, you’ll want the 2017 x86 version of the controls – (for VS2019 as well!).

Download and install the controls. NB. You want to install these when Visual Studio isn’t running – as we’re in the heady world of the GAC here, and VS won’t find them unless it’s started with them there.

Do do this example yourself – you’ll also need a Neo4j database instance running, I’d recommend using the Neo4j Desktop as it makes it easier to manage the process.

Create your first package

1. Start up Visual Studio
2. Create a new Integration Services project

New Project…

3. In the new Package.dtsx file, we need to add a Connection Manager. Right click on the bottom ‘Connection Managers’ bar and add a Neo4j connection – if you don’t see it – you might have to restart Visual Studio, or possibly your machine.

Then select the Neo4j Connection:

You’ll now see it in the ‘Connection Managers’ section:

Select it – and change the connection properties to ones that match your database instance – at the moment this is done via the properties window:

At this stage, we have a connection – but we’re not using it, so let’s add a task to execute:

Drag the ‘Execute Cypher Task’ to the Control Flow, and double click on it. Then add the following Cypher:

CREATE (:Node {Id:1})

Press OK

Then we can execute the task:

Once that’s done:

If we go to our Neo4j Database, we can run:

MATCH (n:Node) RETURN n

If we look at the ‘Id’ property – we can see it is ‘1’

So. Now we have an SSIS integration package executing against a Neo4j database.

These controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Actually using the new DataConnector for PowerBI

After I’d written it – I realised my last post was perhaps not the most useful for those who really don’t care about the how but want to know what to do to use it. So this will follow the same deal as with the last post (over a year and a half ago!! WOW!).

Video version below if you want it:

The Setup Steps

First – we’ve got to install PowerBI – now, I didn’t sign up for an account, but downloaded it from the PowerBI website, and installing was simple and quick.

We also need to have Neo4j running, and you can use Community or Enterprise, it matters not – and we’ll want to put the β€˜Movies’ dataset in there, so run your instance, and execute:

:play movies

Add the Data Connector to Power BI

1. First – download the connector from the releases page (or build it yourself in VS) – you want the `Neo4j.mez` file.

Version 1: https://github.com/cskardon/Neo4jDataConnectorForPowerBi/releases/tag/1.0.0

2. PowerBI looks for custom connectors in the <USER>\Documents\Power BI Desktop\Custom Connectors folder, which if it doesn’t exist – you’ll need to create. Once you have that folder, copy the Neo4j.mez connector there.

image

3. Nearly there – we just need to allow PowerBI to load the connector now – so, start up PowerBI and go to the Options dialog:

image

Once there, select the β€˜Security’ option, and then under the Data Extensions header select the option allowing you to load any extension without validation or warning:

image

You’ll have to restart PowerBI to get the connector to be picked up – so go ahead and do that now!

Lets Get Some Data!

Now – I know a lot of you will have been excited by the Pie Chart from the last post – now you can create your own!

With a new instance of PowerBI running, let’s select β€˜Get Data’

image

We can now either search for ‘Neo4j’ or look in the ‘Database’ types:

Select Neo4jthen press ‘Connect’ – aaaaand a warning!

Read it, ignore it – it’s up to you – but this is just to let you know that it’s still in Beta (I mean it’s only had one release so far!) Continue if you’re happy to.

Now you’re given the boxes to enter your Cypher and connection information – the text box for the Cypher field is a single line (ugh) – so if it’s a complicated query – you’re probably best of writing it in Sublime or similar (maybe even Notepad!?!). In this case, we can go simple:

MATCH (m:Movie) RETURN m;

Now, the other settings, if you’re running default settings – you can leave these, but obviously if you need to connect to https instead of http change it in the scheme setting. I’ve filled in my display with the defaults:

When you press OK, you get the Login dialog, ifΒ you are anonymously connecting, then select Anonymous, else – fill in your username / password.

Press ‘Connect’, and PowerBI will connect to your DB and return you back a list of ‘Record’:

We’ll want to ‘Edit’ this, so press ‘Edit’!

When the Power Query Editor opens press the expand column button at the top of the ‘m’ column:

Ooooooh, ‘tagline’, ‘title’ and ‘released’ — our movie properties! For this, I would turn off the ‘use original column name as prefix’ checkbox, leave them all selected and press OK.

Data!!

Now, let’s ‘Close & Apply’ our query – NB – if you look in the ‘Applied Steps’ section, you can see we only have 2 steps, ‘Source’ and ‘Expanded m’

Whilst PowerBI applies it just think of the Pie charts that lie ahead of us:

When that dialog disappears, we’re good to go! On the right you’ll see a ‘Fields’ section, and you should see something like this:

So, the moment we’ve all waited for…

Let’s Pie Chart

Select ‘Pie Chart’ from the Visualizations section:

Once it’s in your display – select it and drag the ‘released’ field from the Query1 to the ‘details’ field, and then title to the ‘values’ field:

Your chart should look something like this now:

So let’s max size it, and mouse over it, now we can see:

But we can also drag ‘title’ on to the Tooltips field like so to get the First (or last) movie in that group:

What does a query look like under the covers?

Some of us like visual, some like code, the last time we tried this – our query was 20 lines long – our new query though – that’s just 5!

let
    Source = Neo4j.ExecuteCypher("MATCH (m:Movie) RETURN m;", "http", "localhost", 7474),
    #"Expanded m" = Table.ExpandRecordColumn(Source, "m", {"tagline", "title", "released"}, {"tagline", "title", "released"})
in
    #"Expanded m"

Which is much nicer.

Hey hey! It’s Beta!

The Data Connector approach gives a much nicer way to query the database, it strips out a lot of the code we have to write, and hopefully makes the querying easier.

BUT – I am not a PowerBI expert – is this the right way to do this? Are there improvements? Some hardcoded queries we should have there? Let me know – do a PR – it’s all good!

PowerBI With Neo4j – How do you build a DataConnector?

Pie Chart of Movies

TL;DR;

Repo is at: https://github.com/cskardon/Neo4jDataConnectorForPowerBi
Release at: https://github.com/cskardon/Neo4jDataConnectorForPowerBi/releases

Looky! Pie Charts!

Pie Chart of Movies

This glorious picture represents the very pinnacle of my PowerBI experience, beforehand I was pulling the data into Excel and charting myself – no longer!

Jokes aside, the big news here is that I’ve dramatically improved upon my previous post where I showed how you could connect to a security enabled Neo4j instance from PowerBI by generating your own base64 encoded string. All in all, that’s a terrible approach, sure – it works, but it’s not really manageable for any real use.

Writing a Power BI data connector.

There are a few guides on this, I found the Microsoft repo on github for Data Connectors to be super handy. In essence you write them in  ‘M Power Query’ which is Power BI’s query language of choice. I opted to write my connector in Visual Studio – so went and got the Power Query SDK extension.

The nice thing about this is that it allows me to test my connector without needing to constantly start/stop PowerBI. So! We get that installed and create a new Data Connector project:

New Project

This gets you a new Data Connector project with two files that you’ll initially care about, a .pq file and a .query.pq file. The latter being a ‘unit test’ file. Let’s first look at the .pqfile.

.PQ

A .pqfile is simply a PowerQuery file, it’s written in M and if you’re a PowerBI specialist – I assume that’s all good – for a non-PowerBI user (me) it means learning some stuff.

So, if you just F5 the project you should get a swirly thing, followed by an error saying credentials are needed.

Select ‘Anonymous’

Then press ‘Set Credential’ – then press F5 again – results!

OK, so what did we actually run when we pressed F5? Remember the .query.pqfile? That is executing the Contents() query on the default connector.

let
    result = PQExtension1.Contents()
in
    result

OK, so far so – hum drum. This is really to get you used to the Power BI development experience. The good news is that we can just copy / paste from the old post I did and we can have a working function – taking into account that (a) we have the same data (movies DB) and the same user pass (neo4j/neo).

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = () =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                Headers=[Authorization="Basic bmVvNGo6bmVv"],
                Content=Text.ToBinary("{""statements"" : [ {
                        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

You should get results saying: [Record]which is what you have, if you get that – you have successfully connected to Neo4j! Good job! Now, first things first, let’s strip out the Authorization header and auto generate that.

Authorization

We have two forms, anonymous and user/pass. For anonymous, we don’t want to send the header, for user/pass – we obviously do. Let’s start with user/pass as we’re already there.

So let’s add another function, it’ll generate the headers for us, let’s firstly hardcode it:

DefaultRequestHeaders = [
    #"Authorization" = "Basic " & Binary.ToText("neo4j:neo")
];

Changing our function to be:

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = () =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                //Change HERE vvvvv
                Headers=DefaultRequestHeaders,
                Content=Text.ToBinary("{""statements"" : [ {
                        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

Pressing F5 will connect and get the same result. So – we know we can do the base64 conversion in PowerBI – this is good. But, still – not ideal to have usernames and passwords hardcoded – for someΒ reason. So let’s let PowerBI get us a user/pass.

Navigate to the ‘Data Source Kind description’ section and UsernamePassword there:

// Data Source Kind description
PQExtension1 = [
    Authentication = [
        // Key = [],
        UsernamePassword = [],
        // Windows = [],
        Implicit = []
    ],
    Label = Extension.LoadString("DataSourceLabel")
];

You can add things like ‘UsernameLabel’ in there if you want (I have, but for the purposes of this – I’m not gonna bother) – they make it look pretty for the PowerBI people out there πŸ™‚

OK, when you connect now (you might have to delete your credentials – the easiest way being selecting the ‘Credentials’ tab in the M Query Output window and ‘Delete Credential’) you will be able to select User/Pass as an option

But hey! We’re not actually using it yet! So let’s get the values from PowerBI using a handy function (that is really hard to find out about) called: Extension.CurrentCredential() with which we can get the username / password, so let’s update our DefaultRequestHeaders to use it:

DefaultRequestHeaders = [
    #"Authorization" = "Basic " & Neo4j.Encode(Extension.CurrentCredential()[Username], Extension.CurrentCredential()[Password])
];

OK, the dream is alive! For anonymous, basically we want to remove the headers, to do that we need to check what type of authentication is in use, and we’re back to the hilariously undocumented Extension.CurrentCredential method again:

Headers = if Extension.CurrentCredential()[AuthenticationKind] = "Implicit" then null else  DefaultRequestHeaders,

We look for ‘Implicit’ as that’s what Anonymous is – with this we set the Headers to null if we’re anonymous, and the headers if not – ACE!

Getting Stuff

The crux of the whole operation, now we’re able to connect with user/pass and anonymous, it’s probably time we dealt with the hardcoded Cypher. Let’s take in a parameter to our method:

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = (cypher as text) =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                Headers=DefaultRequestHeaders,
                Content=Text.ToBinary("{""statements"" : [ {
                                //Change HERE vvvvv
                        ""statement"" : "" " & cypher & " ""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

Excellent, now we need to change our query.pqfile to call it:

let 
	result = PQExtension1.Contents("MATCH (n) RETURN COUNT(n)")
in
	result

F5 and see what happens – now we’re passing Cypher to the instance, and getting back results.

This largely covers how to build your own connector, if you look in the source code (and I encourage you to – it’s only 156 lines long including comments) – in it you’ll see I’ve abstracted out some of the stuff we’ve done here, named things properly, and I also pull in the address, port and scheme to allow a user to set it.

Better Know APOC #4 : apoc.coll.sort*

Neo4j.Version 3.3.4
APOC Version 3.3.0.2

If you haven’t already, please have a look at theΒ introΒ post to this series, it’ll cover stuff which I’m not going to go into on this.


OK, ‘apoc.coll’ has 43 (that’s right – 43) functions and procedures, but I’m only going to cover the ‘sort’ ones for this post – why? Because a post containing 43 different functions – whilst a good % of the overall, would be way too long.

As it is, with ‘sort’ we have 4 functions:

  • apoc.coll.sort
  • apoc.coll.sortMaps
  • apoc.coll.sortMulti
  • apoc.coll.sortNodes

The Whys

These are methods to sort collections, the clue is in the name, but why do we need them? We can sort in Cypher right? We have ‘ORDER BY‘, who wrote this extra bit of code that has no use?? Who?!?!

When Hunger strikes, run for cover

Hunger. Hmmm given his pedigree we may have to assume this was done for a reason… Let’s explore that a bit with the apoc.coll.sort method…

apoc.coll.sort

This is your basic sort method, given a collection, return it sorted. It doesn’t matter what type the collection is, it will sort it.

Parameters

Just the one for in and one for out, the in is the collection to sort, the out is the sorted collection.

Examples

We’ll look (for this case) at doing it the traditional Cypher way, and then the APOC way.

The Cypher way

It’s worth seeing the Cypher way so you can appreciate sort, this is based on this question on Stack Overflow.

We’ll have a collection which is defined as such:

WITH [2,3,6,5,1,4] AS collection

Let’s sort this the Cypher way – easy!

WITH [2,3,6,5,1,4] AS collection
 RETURN collection ORDER BY ????

Errr, ok, looks like we’re gonna need to tap into some unwinding!

WITH [2,3,6,5,1,4] AS collection
UNWIND collection AS item
WITH item ORDER BY item
RETURN collect(item) AS sorted

that’s got it! So we UNWIND the collection, then WITH each item (ORDER BY) we then COLLECT them back again.

The APOC way

WITH [2,3,6,5,1,4] AS collection
RETURN apoc.coll.sort(collection) AS sorted

That’s aΒ lot easier to read, it’s also a lot easier to use inline. The Cypher version above might look ok, but imagine you have a more complicated query, and you need to either do multiple sorts, or even just anything extra, it can quickly become unwieldy.

apoc.coll.sortMaps

A Map (or Dictionary for those .NETters out there) is the sort of thing we return from Neo4j all the time, and this function allows us to sort on a given property of a Map.

Examples

For these examples, we’ll have ‘coll’ defined as:

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll

An array of maps, with an ‘Str‘ property, and a ‘Num‘ property.

Sort by string

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll
RETURN apoc.coll.sortMaps(coll, 'Str')

Returns us a list of the maps, looking like:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'Str')"                                     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Str":"A","Num":4},{"Str":"B","Num":3},{"Str":"C","Num":2},{"Str":"Dβ”‚
β”‚","Num":1}]                                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In which we can see the maps go from ‘A’ to ‘D’

Sort by Number

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll
RETURN apoc.coll.sortMaps(coll, 'Str')

Unsurprisingly, this gets us the following:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'Num')"                                     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Str":"D","Num":1},{"Str":"C","Num":2},{"Str":"B","Num":3},{"Str":"Aβ”‚
β”‚","Num":4}]                                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Which goes from 1 to 4.

Sort order isΒ Ascending,Β there is no way to do a descending sort. You basically do a ‘reverse’ to get the sort the other way.

apoc.coll.sortMulti

This is the equivalent of doing a ‘Sort, Then By’ – so if I take the ‘sortMaps’ function above and run it like so:

WITH [{First:'B', Last:'B'}, {First:'A', Last:'A'}, {First:'B', Last:'A'}, {First:'C', Last:'A'}] AS coll
RETURN apoc.coll.sortMaps(coll, 'First')

I get:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'First')"                                   β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Last":"A","First":"A"},{"Last":"B","First":"B"},{"Last":"A","First"β”‚
β”‚:"B"},{"Last":"A","First":"C"}]                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The problem here is the two elements:

{"Last":"B","First":"B"},{"Last":"A","First":"B"}

I want these to be the other way around, so I have to switch to ‘Multi’:

WITH [{First:'B', Last:'B'}, {First:'A', Last:'A'}, {First:'B', Last:'A'}, {First:'C', Last:'A'}] AS coll
UNWIND apoc.coll.sortMulti(coll, ['^First', '^Last']) AS unwound
RETURN unwound.First AS first, unwound.Last AS last

This get’s me:

╒═══════╀══════╕
β”‚"first"β”‚"last"β”‚
β•žβ•β•β•β•β•β•β•β•ͺ══════║
β”‚"A"    β”‚  "A" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"B"    β”‚  "A" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"B"    β”‚  "B" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"C"    β”‚  "A" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

One this to note here – (and I thinkΒ it’s quite important) is that this is theΒ only method thatΒ defaults toΒ Descending order. To get Ascending search, you have to prefix columns with a ‘^’ character (as I’ve done in this case).

apoc.coll.sortNodes

Nearly there! This takes a collection ofΒ nodes and sorts them onΒ 1 property – so, let’s add some nodes:

CREATE (n1:CollNode {col1: 1, col2: 'D'})
CREATE (n2:CollNode {col1: 2, col2: 'C'})
CREATE (n3:CollNode {col1: 3, col2: 'B'})
CREATE (n4:CollNode {col1: 4, col2: 'A'})

And let’s do a sort:

MATCH (n:CollNode)
WITH apoc.coll.sortNodes(COLLECT(n), 'col2') AS sorted
UNWIND sorted AS n
RETURN n.col1 AS col1, n.col2 AS col2

Now, you could argue this adds little to the party as you can already ORDER BY, and by and large you’re right – the nice thing about the apoc version is that you can call it as I have above, rather than having to do the sort afterwards. Having said that, ORDER BY does have a DESC keyword as well, which sortNodes does not :/

Conclusions

apoc.coll.sort* is useful, that’s the main thrust, some are more useful than others, and there are a few omissions (like the ability to sort desc for all but the sortMulti method) which could be good simple pull requests.

They are what they are, sorting methods πŸ™‚