Execute Cypher Task Updates

Last week, Anabranch released version 1.1 of the tools for Neo4j – which included a very welcome addition to the toolset – being able to pull data from a Neo4j instance.

After doing the demo post – I noticed a peculiarity – a quirk if you will with how the ‘Execute Cypher Task’ (see here) worked with multiple Neo4j Connection Managers defined – it would execute the Cypher against all the Connection Managers, not any specific one. This makes sense in some form – as a Control Flow Task doesn’t have a ‘Connection’ selector unlike a Data Flow Task.

Version 1.2 of the tools fixes this and tidies up the Execute Cypher Task to have a better user interface as well.

Adding Cypher

The Cypher box still has highlighting, but now lines up to the edges properly.

Choosing A Connection

You can choose from a drop down which connection you want to use. You will only see Neo4j Connection Managers here. The name of the Connection Manager will be the one you set it to.

Validation

You will get a red cross on your task if you’re missing things (in this case the connection). If you look in the ‘Error List’ you will be able to see all the errors:

Getting 1.2

To get version 1.2, please visit: http://bit.ly/neo4jssis register and you’ll be sent out the download link. Registration is only used to let you know of updates to the tools, no marketing!

Using a Data Flow to move data **from** Neo4j in SSIS

That’s right everyone! We’re going from Neo4j this time, and this is a new release, the old version (1.0.0.0) didn’t have a ‘Neo4j as a Source’ component, 1.1.0.0 does.

In the last post we took data from a file and ingested it into Neo4j, so far so good – but one of the things we were missing was the ability to also pull from Neo4j, now the circle is complete, and in this post – I’m going to show you how to pull from one Neo4j instance into another. That’s right – Neo4j to Neo4j!

As always – the video below shows the moving version of this post – but not everyone wants that.

The Setup

OK, more complex than normal, as we need multiple instances of Neo4j running – and whilst that’s not rocket science – it is more complex than normal. I don’t want to go into it particularly – but hey! I would run one DB in Neo4j Desktop, then download one of the server editions (for this Community will be just fine!)

Ports!

You need to change the ports on your new server, as the Desktop ones will be using 7474/7687 etc – So open up the neo4j.conf file and change the following settings:

These are the ports I’m using – but go crazy and pick whatever you want – it’s your database after all. Aaaanyhews – I’m going to assume you know how to start your server version of the database. If not – there’s loads of stuff online about how to do it – and if it becomes clear that we’re in a world of pain here – I’ll write one πŸ™‚

Clear the DBs

WARNING!!! – which I don’t think we need – but here you go – make sure you know which DB you are doing this on! Don’t delete your production DB by mistake!!! (β€’_β€’)

On both the DBs we’re going to clear them, and add the ‘Movies’ demo set to one of them so – open up your browser window to both instances (http://localhost:7474 and http://localhost:7676 in my case) and execute:

MATCH (n) DETACH DELETE n

Then in one of the databases (and I will be using my 7474 database) execute:

:play movies

And step to the second step and put the movie data into your database. You can check the data is all there by running:

MATCH (n) RETURN COUNT(n)

You should get 171 nodes. OK, now we’re all set up and ready to go!

Let’s SSIS

As another assumption – I’m going with the fact that you know how to start up Visual Studio and create a new Package.

Let’s first add one connection:

And, obvs pick Neo4j:

Oooh – note the ‘version’ there as well – if yours says lower than that – then bad times πŸ™

Rename it to something like ‘The Source’ or whatever you find memorable:

Make sure the user / pass and server are all correct:

Looking good! Now – repeat – for the other server – remembering the port will be different – and choosing a different name, something like ‘The Destination’ for example, and you should end up with this state of affairs:

Let’s add a ‘Data Flow’ to our package now, again you can rename if you want. I did, but don’t let that force you into doing anything:

Double click on it, and we’re into Data Flow design heaven!

Add the Source

Drag the ‘Execute Cypher Source’ component from the toolbox onto the page:

Double click on it to enter the ‘Edit’ page:

The Cypher we’re going to execute is:

MATCH (m:Movie) RETURN m.title AS title

Now – some TOP TIPS. This works best if you RETURN specific columns, SSIS doesn’t know what to do with a full node, and using the AS there makes the output columns easier to use.

Once you’ve got the Cypher – you need to select the Connection to use (see the picture) – which is why naming them nicely is SUPER useful.

Once you’ve done that, hit ‘Refresh’ to get the Output Columns populated:

Job done. Good work!

Add the Destination

No surprises for guessing this involves dragging the Destination to the page.

Next, join up the Source to the Destination:

The UI for this is not as fully fledged out as the other, so unfortunately we need to head into the Advanced Editor. So Right click on it, and open the Advanced Editor:

First we want to set the connection:

Again – naming!!

Then we’re going to go to the ‘Input’ tab and select our input from the Source:

Press OK to save all that, and then double click on the Destination item and go to the Cypher Editor:

First off – you can see the ‘title’ listed in the parameters, so that’s good – Cypher wise we’re doing a MERGE- so we only get one ‘Cloud Atlas’ (because no-one needs more than one of those).

MERGE (:Movie {title: $title})

At this point, we have our two things and no red crosses or errors anywhere, so let’s run it!

Run it!

No surprises – we press ‘Start’ and get the ‘liney’ version of the page which hopefully you see as:

38 rows (hahaha Rows!) and if you go to your ‘Destination’ database you should see the movies there.

I want it

Of course you do – these controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Neo4j & SSIS – Connecting and executing Cypher in a Control Flow

Last Friday, Anabranch released the first beta version of it’s connector to Neo4j from SSIS. Aside from a post saying that it existed, I didn’t go into detail, so this is going to be a series of posts on how you can use your existing SSIS infrastructure with Neo4j.

Today we’re going to look at 2 parts of the connector, the Neo4j Connection Manager (CM) and the Execute Cypher Task (ECT). The CM is fundamental to all the controls, without it, you can’t connect to the database. I’ll go into what it does, settings etc in another post, but for now – it’s enough to know that it provides the connection. The ECT allows us to execute Cypher against a given connection manager.

** NOTE **
In version 1.0.0(beta) – the ECT will only work with the first CM you add to the package

This video covers the same topic as the text version below:

I’m going to develop this in Visual Studio 2017, at the time of writing – I found the 2019 SSIS packages to be a bit flakey, whereas the 2017 has been sturdy so far – from a ‘demo’ point of view though – the 2019 process is exactly the same after you have it all installed.

SETUP
If you’ve never developed against SSIS before, you’ll need a couple of things, firstly SSDT (specifically the Integration services bits), Visual Studio – I think the community edition should work, but I can’t confirm. You’ll also need the Anabranch Ssis Controls for Neo4j – assuming you’ve registered ( http://anabranch.co.uk/Projects/Neo4jSsis) and have the download link, you’ll want the 2017 x86 version of the controls – (for VS2019 as well!).

Download and install the controls. NB. You want to install these when Visual Studio isn’t running – as we’re in the heady world of the GAC here, and VS won’t find them unless it’s started with them there.

Do do this example yourself – you’ll also need a Neo4j database instance running, I’d recommend using the Neo4j Desktop as it makes it easier to manage the process.

Create your first package

1. Start up Visual Studio
2. Create a new Integration Services project

New Project…

3. In the new Package.dtsx file, we need to add a Connection Manager. Right click on the bottom ‘Connection Managers’ bar and add a Neo4j connection – if you don’t see it – you might have to restart Visual Studio, or possibly your machine.

Then select the Neo4j Connection:

You’ll now see it in the ‘Connection Managers’ section:

Select it – and change the connection properties to ones that match your database instance – at the moment this is done via the properties window:

At this stage, we have a connection – but we’re not using it, so let’s add a task to execute:

Drag the ‘Execute Cypher Task’ to the Control Flow, and double click on it. Then add the following Cypher:

CREATE (:Node {Id:1})

Press OK

Then we can execute the task:

Once that’s done:

If we go to our Neo4j Database, we can run:

MATCH (n:Node) RETURN n

If we look at the ‘Id’ property – we can see it is ‘1’

So. Now we have an SSIS integration package executing against a Neo4j database.

These controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Actually using the new DataConnector for PowerBI

After I’d written it – I realised my last post was perhaps not the most useful for those who really don’t care about the how but want to know what to do to use it. So this will follow the same deal as with the last post (over a year and a half ago!! WOW!).

Video version below if you want it:

The Setup Steps

First – we’ve got to install PowerBI – now, I didn’t sign up for an account, but downloaded it from the PowerBI website, and installing was simple and quick.

We also need to have Neo4j running, and you can use Community or Enterprise, it matters not – and we’ll want to put the β€˜Movies’ dataset in there, so run your instance, and execute:

:play movies

Add the Data Connector to Power BI

1. First – download the connector from the releases page (or build it yourself in VS) – you want the `Neo4j.mez` file.

Version 1: https://github.com/cskardon/Neo4jDataConnectorForPowerBi/releases/tag/1.0.0

2. PowerBI looks for custom connectors in the <USER>\Documents\Power BI Desktop\Custom Connectors folder, which if it doesn’t exist – you’ll need to create. Once you have that folder, copy the Neo4j.mez connector there.

image

3. Nearly there – we just need to allow PowerBI to load the connector now – so, start up PowerBI and go to the Options dialog:

image

Once there, select the β€˜Security’ option, and then under the Data Extensions header select the option allowing you to load any extension without validation or warning:

image

You’ll have to restart PowerBI to get the connector to be picked up – so go ahead and do that now!

Lets Get Some Data!

Now – I know a lot of you will have been excited by the Pie Chart from the last post – now you can create your own!

With a new instance of PowerBI running, let’s select β€˜Get Data’

image

We can now either search for ‘Neo4j’ or look in the ‘Database’ types:

Select Neo4jthen press ‘Connect’ – aaaaand a warning!

Read it, ignore it – it’s up to you – but this is just to let you know that it’s still in Beta (I mean it’s only had one release so far!) Continue if you’re happy to.

Now you’re given the boxes to enter your Cypher and connection information – the text box for the Cypher field is a single line (ugh) – so if it’s a complicated query – you’re probably best of writing it in Sublime or similar (maybe even Notepad!?!). In this case, we can go simple:

MATCH (m:Movie) RETURN m;

Now, the other settings, if you’re running default settings – you can leave these, but obviously if you need to connect to https instead of http change it in the scheme setting. I’ve filled in my display with the defaults:

When you press OK, you get the Login dialog, ifΒ you are anonymously connecting, then select Anonymous, else – fill in your username / password.

Press ‘Connect’, and PowerBI will connect to your DB and return you back a list of ‘Record’:

We’ll want to ‘Edit’ this, so press ‘Edit’!

When the Power Query Editor opens press the expand column button at the top of the ‘m’ column:

Ooooooh, ‘tagline’, ‘title’ and ‘released’ — our movie properties! For this, I would turn off the ‘use original column name as prefix’ checkbox, leave them all selected and press OK.

Data!!

Now, let’s ‘Close & Apply’ our query – NB – if you look in the ‘Applied Steps’ section, you can see we only have 2 steps, ‘Source’ and ‘Expanded m’

Whilst PowerBI applies it just think of the Pie charts that lie ahead of us:

When that dialog disappears, we’re good to go! On the right you’ll see a ‘Fields’ section, and you should see something like this:

So, the moment we’ve all waited for…

Let’s Pie Chart

Select ‘Pie Chart’ from the Visualizations section:

Once it’s in your display – select it and drag the ‘released’ field from the Query1 to the ‘details’ field, and then title to the ‘values’ field:

Your chart should look something like this now:

So let’s max size it, and mouse over it, now we can see:

But we can also drag ‘title’ on to the Tooltips field like so to get the First (or last) movie in that group:

What does a query look like under the covers?

Some of us like visual, some like code, the last time we tried this – our query was 20 lines long – our new query though – that’s just 5!

let
    Source = Neo4j.ExecuteCypher("MATCH (m:Movie) RETURN m;", "http", "localhost", 7474),
    #"Expanded m" = Table.ExpandRecordColumn(Source, "m", {"tagline", "title", "released"}, {"tagline", "title", "released"})
in
    #"Expanded m"

Which is much nicer.

Hey hey! It’s Beta!

The Data Connector approach gives a much nicer way to query the database, it strips out a lot of the code we have to write, and hopefully makes the querying easier.

BUT – I am not a PowerBI expert – is this the right way to do this? Are there improvements? Some hardcoded queries we should have there? Let me know – do a PR – it’s all good!

PowerBI With Neo4j – How do you build a DataConnector?

Pie Chart of Movies

TL;DR;

Repo is at: https://github.com/cskardon/Neo4jDataConnectorForPowerBi
Release at: https://github.com/cskardon/Neo4jDataConnectorForPowerBi/releases

Looky! Pie Charts!

Pie Chart of Movies

This glorious picture represents the very pinnacle of my PowerBI experience, beforehand I was pulling the data into Excel and charting myself – no longer!

Jokes aside, the big news here is that I’ve dramatically improved upon my previous post where I showed how you could connect to a security enabled Neo4j instance from PowerBI by generating your own base64 encoded string. All in all, that’s a terrible approach, sure – it works, but it’s not really manageable for any real use.

Writing a Power BI data connector.

There are a few guides on this, I found the Microsoft repo on github for Data Connectors to be super handy. In essence you write them in  ‘M Power Query’ which is Power BI’s query language of choice. I opted to write my connector in Visual Studio – so went and got the Power Query SDK extension.

The nice thing about this is that it allows me to test my connector without needing to constantly start/stop PowerBI. So! We get that installed and create a new Data Connector project:

New Project

This gets you a new Data Connector project with two files that you’ll initially care about, a .pq file and a .query.pq file. The latter being a ‘unit test’ file. Let’s first look at the .pqfile.

.PQ

A .pqfile is simply a PowerQuery file, it’s written in M and if you’re a PowerBI specialist – I assume that’s all good – for a non-PowerBI user (me) it means learning some stuff.

So, if you just F5 the project you should get a swirly thing, followed by an error saying credentials are needed.

Select ‘Anonymous’

Then press ‘Set Credential’ – then press F5 again – results!

OK, so what did we actually run when we pressed F5? Remember the .query.pqfile? That is executing the Contents() query on the default connector.

let
    result = PQExtension1.Contents()
in
    result

OK, so far so – hum drum. This is really to get you used to the Power BI development experience. The good news is that we can just copy / paste from the old post I did and we can have a working function – taking into account that (a) we have the same data (movies DB) and the same user pass (neo4j/neo).

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = () =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                Headers=[Authorization="Basic bmVvNGo6bmVv"],
                Content=Text.ToBinary("{""statements"" : [ {
                        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

You should get results saying: [Record]which is what you have, if you get that – you have successfully connected to Neo4j! Good job! Now, first things first, let’s strip out the Authorization header and auto generate that.

Authorization

We have two forms, anonymous and user/pass. For anonymous, we don’t want to send the header, for user/pass – we obviously do. Let’s start with user/pass as we’re already there.

So let’s add another function, it’ll generate the headers for us, let’s firstly hardcode it:

DefaultRequestHeaders = [
    #"Authorization" = "Basic " & Binary.ToText("neo4j:neo")
];

Changing our function to be:

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = () =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                //Change HERE vvvvv
                Headers=DefaultRequestHeaders,
                Content=Text.ToBinary("{""statements"" : [ {
                        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

Pressing F5 will connect and get the same result. So – we know we can do the base64 conversion in PowerBI – this is good. But, still – not ideal to have usernames and passwords hardcoded – for someΒ reason. So let’s let PowerBI get us a user/pass.

Navigate to the ‘Data Source Kind description’ section and UsernamePassword there:

// Data Source Kind description
PQExtension1 = [
    Authentication = [
        // Key = [],
        UsernamePassword = [],
        // Windows = [],
        Implicit = []
    ],
    Label = Extension.LoadString("DataSourceLabel")
];

You can add things like ‘UsernameLabel’ in there if you want (I have, but for the purposes of this – I’m not gonna bother) – they make it look pretty for the PowerBI people out there πŸ™‚

OK, when you connect now (you might have to delete your credentials – the easiest way being selecting the ‘Credentials’ tab in the M Query Output window and ‘Delete Credential’) you will be able to select User/Pass as an option

But hey! We’re not actually using it yet! So let’s get the values from PowerBI using a handy function (that is really hard to find out about) called: Extension.CurrentCredential() with which we can get the username / password, so let’s update our DefaultRequestHeaders to use it:

DefaultRequestHeaders = [
    #"Authorization" = "Basic " & Neo4j.Encode(Extension.CurrentCredential()[Username], Extension.CurrentCredential()[Password])
];

OK, the dream is alive! For anonymous, basically we want to remove the headers, to do that we need to check what type of authentication is in use, and we’re back to the hilariously undocumented Extension.CurrentCredential method again:

Headers = if Extension.CurrentCredential()[AuthenticationKind] = "Implicit" then null else  DefaultRequestHeaders,

We look for ‘Implicit’ as that’s what Anonymous is – with this we set the Headers to null if we’re anonymous, and the headers if not – ACE!

Getting Stuff

The crux of the whole operation, now we’re able to connect with user/pass and anonymous, it’s probably time we dealt with the hardcoded Cypher. Let’s take in a parameter to our method:

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = (cypher as text) =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                Headers=DefaultRequestHeaders,
                Content=Text.ToBinary("{""statements"" : [ {
                                //Change HERE vvvvv
                        ""statement"" : "" " & cypher & " ""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

Excellent, now we need to change our query.pqfile to call it:

let 
	result = PQExtension1.Contents("MATCH (n) RETURN COUNT(n)")
in
	result

F5 and see what happens – now we’re passing Cypher to the instance, and getting back results.

This largely covers how to build your own connector, if you look in the source code (and I encourage you to – it’s only 156 lines long including comments) – in it you’ll see I’ve abstracted out some of the stuff we’ve done here, named things properly, and I also pull in the address, port and scheme to allow a user to set it.

Better Know APOC #4 : apoc.coll.sort*

Neo4j.Version 3.3.4
APOC Version 3.3.0.2

If you haven’t already, please have a look at theΒ introΒ post to this series, it’ll cover stuff which I’m not going to go into on this.


OK, ‘apoc.coll’ has 43 (that’s right – 43) functions and procedures, but I’m only going to cover the ‘sort’ ones for this post – why? Because a post containing 43 different functions – whilst a good % of the overall, would be way too long.

As it is, with ‘sort’ we have 4 functions:

  • apoc.coll.sort
  • apoc.coll.sortMaps
  • apoc.coll.sortMulti
  • apoc.coll.sortNodes

The Whys

These are methods to sort collections, the clue is in the name, but why do we need them? We can sort in Cypher right? We have ‘ORDER BY‘, who wrote this extra bit of code that has no use?? Who?!?!

When Hunger strikes, run for cover

Hunger. Hmmm given his pedigree we may have to assume this was done for a reason… Let’s explore that a bit with the apoc.coll.sort method…

apoc.coll.sort

This is your basic sort method, given a collection, return it sorted. It doesn’t matter what type the collection is, it will sort it.

Parameters

Just the one for in and one for out, the in is the collection to sort, the out is the sorted collection.

Examples

We’ll look (for this case) at doing it the traditional Cypher way, and then the APOC way.

The Cypher way

It’s worth seeing the Cypher way so you can appreciate sort, this is based on this question on Stack Overflow.

We’ll have a collection which is defined as such:

WITH [2,3,6,5,1,4] AS collection

Let’s sort this the Cypher way – easy!

WITH [2,3,6,5,1,4] AS collection
 RETURN collection ORDER BY ????

Errr, ok, looks like we’re gonna need to tap into some unwinding!

WITH [2,3,6,5,1,4] AS collection
UNWIND collection AS item
WITH item ORDER BY item
RETURN collect(item) AS sorted

that’s got it! So we UNWIND the collection, then WITH each item (ORDER BY) we then COLLECT them back again.

The APOC way

WITH [2,3,6,5,1,4] AS collection
RETURN apoc.coll.sort(collection) AS sorted

That’s aΒ lot easier to read, it’s also a lot easier to use inline. The Cypher version above might look ok, but imagine you have a more complicated query, and you need to either do multiple sorts, or even just anything extra, it can quickly become unwieldy.

apoc.coll.sortMaps

A Map (or Dictionary for those .NETters out there) is the sort of thing we return from Neo4j all the time, and this function allows us to sort on a given property of a Map.

Examples

For these examples, we’ll have ‘coll’ defined as:

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll

An array of maps, with an ‘Str‘ property, and a ‘Num‘ property.

Sort by string

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll
RETURN apoc.coll.sortMaps(coll, 'Str')

Returns us a list of the maps, looking like:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'Str')"                                     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Str":"A","Num":4},{"Str":"B","Num":3},{"Str":"C","Num":2},{"Str":"Dβ”‚
β”‚","Num":1}]                                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In which we can see the maps go from ‘A’ to ‘D’

Sort by Number

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll
RETURN apoc.coll.sortMaps(coll, 'Str')

Unsurprisingly, this gets us the following:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'Num')"                                     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Str":"D","Num":1},{"Str":"C","Num":2},{"Str":"B","Num":3},{"Str":"Aβ”‚
β”‚","Num":4}]                                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Which goes from 1 to 4.

Sort order isΒ Ascending,Β there is no way to do a descending sort. You basically do a ‘reverse’ to get the sort the other way.

apoc.coll.sortMulti

This is the equivalent of doing a ‘Sort, Then By’ – so if I take the ‘sortMaps’ function above and run it like so:

WITH [{First:'B', Last:'B'}, {First:'A', Last:'A'}, {First:'B', Last:'A'}, {First:'C', Last:'A'}] AS coll
RETURN apoc.coll.sortMaps(coll, 'First')

I get:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'First')"                                   β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Last":"A","First":"A"},{"Last":"B","First":"B"},{"Last":"A","First"β”‚
β”‚:"B"},{"Last":"A","First":"C"}]                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The problem here is the two elements:

{"Last":"B","First":"B"},{"Last":"A","First":"B"}

I want these to be the other way around, so I have to switch to ‘Multi’:

WITH [{First:'B', Last:'B'}, {First:'A', Last:'A'}, {First:'B', Last:'A'}, {First:'C', Last:'A'}] AS coll
UNWIND apoc.coll.sortMulti(coll, ['^First', '^Last']) AS unwound
RETURN unwound.First AS first, unwound.Last AS last

This get’s me:

╒═══════╀══════╕
β”‚"first"β”‚"last"β”‚
β•žβ•β•β•β•β•β•β•β•ͺ══════║
β”‚"A"    β”‚  "A" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"B"    β”‚  "A" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"B"    β”‚  "B" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"C"    β”‚  "A" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

One this to note here – (and I thinkΒ it’s quite important) is that this is theΒ only method thatΒ defaults toΒ Descending order. To get Ascending search, you have to prefix columns with a ‘^’ character (as I’ve done in this case).

apoc.coll.sortNodes

Nearly there! This takes a collection ofΒ nodes and sorts them onΒ 1 property – so, let’s add some nodes:

CREATE (n1:CollNode {col1: 1, col2: 'D'})
CREATE (n2:CollNode {col1: 2, col2: 'C'})
CREATE (n3:CollNode {col1: 3, col2: 'B'})
CREATE (n4:CollNode {col1: 4, col2: 'A'})

And let’s do a sort:

MATCH (n:CollNode)
WITH apoc.coll.sortNodes(COLLECT(n), 'col2') AS sorted
UNWIND sorted AS n
RETURN n.col1 AS col1, n.col2 AS col2

Now, you could argue this adds little to the party as you can already ORDER BY, and by and large you’re right – the nice thing about the apoc version is that you can call it as I have above, rather than having to do the sort afterwards. Having said that, ORDER BY does have a DESC keyword as well, which sortNodes does not :/

Conclusions

apoc.coll.sort* is useful, that’s the main thrust, some are more useful than others, and there are a few omissions (like the ability to sort desc for all but the sortMulti method) which could be good simple pull requests.

They are what they are, sorting methods πŸ™‚

Neo4j with Azure Functions

Recently, I’ve had a couple of people ask me how to use Neo4j with Azure functions, and well – I’d not done it myself, but now I have – let’s get it done!

  1. Login to your Azure Portal

  2. CreateΒ  a new Resource, and search for β€˜function app’:

image

  1. Select β€˜Function App’ from the Market Place:

image

  1. Press β€˜Create’ to actually make one:

image

  1. Fill in your details as you want them

image

I’m assuming you’re reasonably au fait with the setting here, in essence if you have a Resource Group you want to put it into (maybe something with a VNet) then go for it, in my case, I’ve just created a new instance of everything.

  1. Create the function, and wait for it to be ready. Mayhaps make a tea or coffee, have a break from the computer for a couple of mins – it’s all good!

image

  1. When it’s ready, click on it and go into the Function App itself (if it doesn’t take you there!)

  2. Create a new function:

image

  1. We want to create an HttpTrigger function in C# for this instance:

image

  1. This gives us a β€˜run.csx’ file, which will have a load of default code, you can run it if you want,

image

and you’ll see an output window appear which will say:

image

Well – good – Azure Functions work, so let’s get a connection to a Neo4j instance – now – for this I’m assuming you have an IP to connect to – you can always use the free tier on GrapheneDB if you want to play around with this.

  1. Add references to a driver

We need to add a reference to a Neo4j client, in this case, I’ll show the official driver, but it will work as well with the community driver. First off, we need to add a β€˜project.json’ file, so press β€˜View Files’ on the left hand side –

image

Then add a file:

image

Then call it project.json – and yes it has to be that name:

image

With our new empty file, we need to paste in the nuget reference we need:

{
Β Β  "frameworks": {
Β Β Β Β  "net46":{
Β Β Β Β Β Β  "dependencies": {
Β Β Β Β Β Β Β Β  "neo4j.driver": "1.5.2"
Β Β Β Β Β Β  }
Β Β Β Β  }
Β Β Β  }
}

Annoyingly if you copy/paste this into the webpage, the function will add extra β€˜closing’ curly braces, so just delete those.

image

If you press β€˜Save and Run’ you should get the same response as before – which is good as it means that the Neo4j.Driver package has been installed, if we look at files, we’ll see the β€˜project.json.lock’ file which we want to.

image

  1. Code

We want to add our connection information now, we’re going to go basic, and just return the COUNT of the nodes in our DB. First we need to add a β€˜using’ statement to our code:

So add,

using Neo4j.Driver.V1;

Then replace the code in the function with:

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
Β Β Β Β  using (var driver = GraphDatabase.Driver("bolt://YOURIP:7687", AuthTokens.Basic("user", "pass")))
Β Β Β Β  {
Β Β Β Β Β Β Β Β  using (var session = driver.Session())
Β Β Β Β Β Β Β Β  {
Β Β Β Β Β Β Β Β Β Β Β Β  IRecord record = session.Run("MATCH (n) RETURN COUNT(n)").Single();
Β Β Β Β Β Β Β Β Β Β Β Β  int count = record["COUNT(n)"].As<int>();
Β Β Β Β Β Β Β Β Β Β Β Β  return req.CreateResponse(HttpStatusCode.OK, "Count: " + count);Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β  
Β Β Β Β Β Β Β Β  }
Β Β Β Β  }
}

Basically, we’ll create a Driver, open a session and then return a 200 with the count!

  1. Run

You can now β€˜Save and Run’ and your output window should now tell you the count:

image

  1. Done

Your first function using Neo4j, Yay!

Neo4jClient turns 3.0

Well, version wise anyhow!

This is a pretty big release and is one I’ve been working on for a while now (sorry!). Version 3.0 of the client finally adds support for Bolt. When Neo4j released version 3.0 of their database, they added a new binary protocol called Bolt designed to be faster and easier on the ol’ network traffic.

For all versions of Neo4jClient prior to 3.x you could only access the DB via the REST protocol (side effect – you also had to use the client to access any version of Neo4j prior to 3.x).

I’ve tried my hardest to minimise the disruption that could happen, and I’m pretty happy to have it down to mainly 1 line change (assuming you’re passing an IGraphClient around – you are right??)

So without further ado:

var client = new GraphClient(new Uri("http://localhost:7474/db/data"), "user", "pass");`

becomes:

var client = new BoltGraphClient(new Uri("bolt://localhost:7687"), "user", "pass");

I’ve gone through a lot of trials with various objects, and I know others have as well (thank you!) but I’m sure there will still be errors – nothing is bug free! So raise an issue – ask on StackOverflow or Twitter πŸ™‚

If you’re using `PathResults` you will need to swap to `BoltPathResults` – the format changed dramatically between the REST version and the Bolt version – and there’s not a lot I can do about it I’m afraid!

Better Know APOC #2: apoc.graph.fromDB

A rubbish picture that really doesn't represent anything - but you could say 'apoc.graph.fromDB' - honestly - you're not missing anything here

Neo4j.Version 3.0.0
APOC Version 3.3.0.1

If you haven’t already, please have a look at the intro post to this series, it’ll cover stuff which I’m not going to go into on this.

Back at the beginning of this series (if you can remember that far!) I talked about using apoc.export.csv.* – and I showed that an example of using apoc.export.csv.graph that took in a graph – and to get that graph – I used apoc.graph.fromDB. I also said I wasn’t going to cover it in that post – and I didn’t. Time to rectify that lack of knowledge!

What does it do?

apoc.graph.fromDB takes your existing DB and creates a whole newΒ virtual graph for your use later on – we’ve seen it in use in episode 1 – the phantom men… sorry – apoc.export.csv.graph, but a virtual graph can be used in other procedures . This particular instance is a hefty ‘catch all’ version – maybe overkill for most needs – but equally – maybe exactly what you’re after (if you’re after dumping your DB).

Setup – Neo4j.conf

dbms.security.procedures.unrestricted=apoc.graph.fromDB

Ins and Outs

Calling apoc.help(‘apoc.graph.fromDB’) get’s us:

Inputs (name :: STRING?, properties :: MAP?) ::
Outputs (graph :: MAP?)

Inputs

Only two this time, and I reckon you can pretty much ignore them, so that’s a win?!

Name

This is as simple as it seems – just the name – I’m going to be honest here – I really am not sure what this is for – you can access it later on though. I’m pretty sure this is a hangover from the other apoc.graph.from* methods – where it makes more sense as a distinguisher – but for this procedure – as we’re just exporting the whole db, go for whatever you like.

Properties

Just a collection of key/values – accessible after the procedure has executed – but otherwise not used by the procedure.

Outputs

Just the one! Amazeballs!

Graph

This is what you need to YIELD to use the procedure (the examples will cover this) – to access the name you use:

RETURN graph.name

To get your properties it’s:

RETURN graph.properties.<your-property-here>

Examples

Assuming as always that you have the Movies DB setup and ready to roll, just call:

CALL apoc.graph.fromDB('Movies', null) YIELD graph

That will pop the whole DB into your browser – now if you do this with a MONSTER database, you’ll only see the first 300 nodes – otherwise no matter your browser you could expect epic failures.

Typically we want to RETURN something rather than just put it on the screen so:

CALL apoc.graph.fromDB('A1Graphs', null) YIELD graph
RETURN *

Oh look – exactly the same – HANDY.

Let’s (for the sake of something) pretend that we have 2 of these and we’re wanting to check the name:

CALL apoc.graph.fromDB('A1Graphs', null) YIELD graph
RETURN graph.name

That’ll get us:

image

(and the award for the dullest blog post picture goes to..)

Let’s set and get some properties:

CALL apoc.graph.fromDB('A1Graphs', {Hello: 'World'}) YIELD graph
RETURN graph.properties

Which returns

image

But if we just want one property:

CALL apoc.graph.fromDB('A1Graphs', {Hello: 'World'}) YIELD graph
RETURN graph.properties.Hello

Note – I’ve used an upper case property name, so I have to use the same case when pulling them out – (I refuse to be cowed into Java conventions). Anyhews, that returns:

image

Notes

You alwaysΒ  need to YIELD unless you literally want to dump your DB to the screen – doing something like:

CALL apoc.graph.fromDB('A1Graphs', null) AS myGraph

Will lead to exceptions – as Neo4j is expecting you to YIELD, you can do:

CALL apoc.graph.fromDB('A1Graphs', null) YIELD graph AS myGraph

and use myGraph throughout the rest of your code no worries.

Better Know APOC #1: apoc.export.csv.*

Export CSV from Neo4j with APOC

Neo4j Version 3.3.0
APOC Version 3.3.0.1

If you haven’t already, please have a look at the intro post to this series, it’ll cover stuff which I’m not going to go into on this.

We’re going to start with the Export functions – in particular exporting to CSV, mainly as people have asked about it – and well – you’ve got to start somewhere.

apoc.export.csv.*

There are 4 functions documented on the GitHub.IO page:

  • apoc.export.csv.query(query, file, config)
  • apoc.export.csv.all(file, config)
  • apoc.export.csv.data(nodes, rels, file, config)
  • apoc.export.csv.graph(graph, file, config)

All of them export a given input to a CSV file, specified by the file parameter.

Setup

There’s a couple of things we need to have in place to use these methods.

Neo4j.conf

We need to let Neo4j know that you allow it to export, and also run the export csv procedures:

apoc.export.file.enabled=true
 dbms.security.procedures.unrestricted=apoc.export.csv.*

 

apoc.export.csv.query

In no particular order (aside from the docs order) we’ll look at the .query version of export.csv.Β  This procedure takes a given query and exports is to a csv file – the format of the RETURN statement in the query directly affects the output, so if you return nodes, you get full node detail.

From the help procedure we get the following for the signature:

Inputs (query :: STRING?, file :: STRING?, config :: MAP?)
Outputs (file :: STRING?, source :: STRING?, format :: STRING?, nodes :: INTEGER?, relationships :: INTEGER?, properties :: INTEGER?, time :: INTEGER?, rows :: INTEGER?)

Inputs

Query

I hope this is obvious – but – it’s the query you want to use to get your CSV columns – personally I write the query first, make sure it’s working then simple copy/paste into my apoc call, so let’s say I want to get all the Movies a given Person (Tom Hanks) has ACTED_IN, I would do:

MATCH (p:Person {name: β€˜Tom Hanks’})-[:ACTED_IN]->(m:Movie) RETURN m.title, m.released

File

This is the filename to export to – I always go fully qualified, but should you want to go relative, it’s relative to the Neo4j home directory

Config

There is only one config setting that affects this procedure:


Parameter Description
d Sets the delimiter for the export, this can only be one character, so something like ‘-β€˜ is ok, or β€˜\t’ (for tabs)

Outputs

File

This is what you passed in, It doesn’t give you the fully qualified location, just what you passed in.

Source

This will say something like β€˜statement: cols(2)’ to indicate the query returned 2 columns, I don’t think I need to explain that the number will change depending on what you return.

Format

Always gonna be csv

Nodes

If you’re returning nodes instead of just properties, this will give you the count of nodes you are exporting.

Relationships

This returns the count of the relationships being returned.

Properties

Will be 0 if you’re just returning nodes, otherwise will be a total count of all the properties returned. So if you’re returning 2 properties, but from 12 nodes, you get 24 properties.

Time

How long the export took in milliseconds – bear in mind – this will be less than the total query time you’ll see in something like the browser, due to rendering etc

Rows

The number of rows returned by the query and put into the CSV file – directly matches the number of lines you’ll have in your CSV file.

Examples

We want to export all the Movie titles and release years for files Tom Hanks has ACTED_IN – we’ve already got that query in place (up above) so lets put it into export.csv:

CALL apoc.export.csv.query(
Β Β Β Β  "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN m.title, m.released", 
Β Β Β Β  "c:/temp/exportedGraph.csv", 
Β Β Β Β  null
)

I like to put the parameters onto different lines, you can whack it all in one line if that’s your fancy! I’m passing null for the config at the moment, as there’s no change I want to make.If I run this, I will get:

image

We can change that and return just the β€˜m’:

CALL apoc.export.csv.query(
Β Β Β  "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN m", 
Β Β Β  "c:/temp/exportedGraph.csv", 
Β Β Β Β  null
 )

Which gives a lot more detail – about all of the node:

image

OK, let’s play with the parameter, now I’m a big fan of the tab delimited format, so let’s make that happen:

CALL apoc.export.csv.query(
Β Β Β Β  "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURNΒ  m.title, m.released", 
Β Β Β Β  "c:/temp/exportedGraph.csv", 
Β Β Β Β  {d:'\t'}
)

That gets us:

image

apoc.export.csv.all(file, config)

A lot of the following detail is the same as for the above procedure, so this will be short (in fact the next couple as well).

Config is the same – just one parameter to play with – ‘d’ or delimiter. What we don’t have is the ‘query’ parameter anymore – now we get the entire content of the database in one go – boom!

//Tab delimiting the 'all' query
CALL apoc.export.csv.all(
   "c:/temp/exportedGraph.csv", 
   {d:'\t'}
)

What you’ll find is that as you have no control over the format of the result, the authors of APOC have kindly pulled all the properties out, so if you run it against our Movies database, you get:

"_id","_labels","name","born","released","title","tagline","_start","_end","_type","roles","rating","summary"

as the header, and obviously all the rows are all the nodes!

But hang on.. Movie doesn’t have aΒ born property – (at least in our DB). If we look at our DB we actually have a Movie node and a Person node, andΒ all is dumping out everything – when you scroll down the file you’ll see the different rows have theirΒ _labels property.

(To be said in a Columbo voice) “Just one last thing”… we also get the relationship details – if you scroll down the file you’ll see rows like:

,,,,,,,"322","325","ACTED_IN","[""Neo""]","",""

These are the relationships – so you really are gettingΒ everything.

apoc.export.csv.data(nodes, rels, file, config)

OK, in this one we’re looking to pass a collection of Nodes and Relationships to be exported, but how do we get those nodes and relationships? With a query!

MATCH (m:Movie)<-[r:ACTED_IN]-(p:Person {name:'Tom Hanks'})
WITH COLLECT(m) AS movies, COLLECT(r) AS actedIn
CALL apoc.export.csv.data(
 movies, 
 actedIn, 
 'c:/temp/exportedGraph.csv', 
 null
) YIELD file, nodes, relationships
RETURN file, nodes, relationships

A couple of things to note, in line 2, we COLLECT theΒ m andΒ r values so we can pass them to the APOC procedure.

You may ask “Hey Chris, why exactly are you YIELDing? We didn’t need to do that before”, and you’re right of course. But because we have a MATCH query at the top the Cypher parser won’t let us end with a CALL clause, so we need to YIELD the result from the CALL and then RETURN those results as well (we’ll have to do this with the next procedure as well).

Now we’ve got the nodes and relationships exported we’ll find when we look at our CSV that it’s taken the same approach as theΒ all procedure and picked out the properties so we’ll end up with a header like:

"_id","_labels","title","tagline","released","_start","_end","_type","roles"

Which makes a LOAD CSV call super easy later on – should that be your dream goal (cypher dreams are reserved for Michael Hunger only πŸ™‚ ).

Config wise – it’s the same as the others, only the ‘d’ option for the delimiter.

apoc.export.csv.graph(graph, file, config)

This procedure exports a graph – simple really. But what is aΒ graph when it’s a home? Well that leads to another APOC procedure – apoc.graph.fromDB which we’ll not cover here, needless to say Imma going to use it:

CALL apoc.graph.fromDB('movies', {}) YIELD graph
CALL apoc.export.csv.graph(
 graph,
 'c:/temp/exportedGraph.csv',
 null
 ) YIELD file, nodes, relationships
RETURN file, nodes, relationships

Say WHAT??!

OK, simples – the first line exports the whole database (I’m not doing any filtering here) into aΒ graph identifier which I then pass into the call to export.

This exports in exactly the same way as theΒ all version – so if you pass an entire DB – you get an entire DB. In fact – the code above does exactly the same as the call toΒ allΒ just in a more complicated way, and in keeping with that theme, the old faithful ‘d’ for delimiter config parameter is there for your use.

Summary

OK, we’ve looked at all the export.csv procedures now, (unless in the time I’ve taken to write this they’ve put another one in) and hopefully I’ve cleared up a few questions about how it works and what exactly the configuration options are. If not, let me know and I’ll augment.