Using a Data Flow to move data **from** Neo4j in SSIS

That’s right everyone! We’re going from Neo4j this time, and this is a new release, the old version (1.0.0.0) didn’t have a ‘Neo4j as a Source’ component, 1.1.0.0 does.

In the last post we took data from a file and ingested it into Neo4j, so far so good – but one of the things we were missing was the ability to also pull from Neo4j, now the circle is complete, and in this post – I’m going to show you how to pull from one Neo4j instance into another. That’s right – Neo4j to Neo4j!

As always – the video below shows the moving version of this post – but not everyone wants that.

The Setup

OK, more complex than normal, as we need multiple instances of Neo4j running – and whilst that’s not rocket science – it is more complex than normal. I don’t want to go into it particularly – but hey! I would run one DB in Neo4j Desktop, then download one of the server editions (for this Community will be just fine!)

Ports!

You need to change the ports on your new server, as the Desktop ones will be using 7474/7687 etc – So open up the neo4j.conf file and change the following settings:

These are the ports I’m using – but go crazy and pick whatever you want – it’s your database after all. Aaaanyhews – I’m going to assume you know how to start your server version of the database. If not – there’s loads of stuff online about how to do it – and if it becomes clear that we’re in a world of pain here – I’ll write one πŸ™‚

Clear the DBs

WARNING!!! – which I don’t think we need – but here you go – make sure you know which DB you are doing this on! Don’t delete your production DB by mistake!!! (β€’_β€’)

On both the DBs we’re going to clear them, and add the ‘Movies’ demo set to one of them so – open up your browser window to both instances (http://localhost:7474 and http://localhost:7676 in my case) and execute:

MATCH (n) DETACH DELETE n

Then in one of the databases (and I will be using my 7474 database) execute:

:play movies

And step to the second step and put the movie data into your database. You can check the data is all there by running:

MATCH (n) RETURN COUNT(n)

You should get 171 nodes. OK, now we’re all set up and ready to go!

Let’s SSIS

As another assumption – I’m going with the fact that you know how to start up Visual Studio and create a new Package.

Let’s first add one connection:

And, obvs pick Neo4j:

Oooh – note the ‘version’ there as well – if yours says lower than that – then bad times πŸ™

Rename it to something like ‘The Source’ or whatever you find memorable:

Make sure the user / pass and server are all correct:

Looking good! Now – repeat – for the other server – remembering the port will be different – and choosing a different name, something like ‘The Destination’ for example, and you should end up with this state of affairs:

Let’s add a ‘Data Flow’ to our package now, again you can rename if you want. I did, but don’t let that force you into doing anything:

Double click on it, and we’re into Data Flow design heaven!

Add the Source

Drag the ‘Execute Cypher Source’ component from the toolbox onto the page:

Double click on it to enter the ‘Edit’ page:

The Cypher we’re going to execute is:

MATCH (m:Movie) RETURN m.title AS title

Now – some TOP TIPS. This works best if you RETURN specific columns, SSIS doesn’t know what to do with a full node, and using the AS there makes the output columns easier to use.

Once you’ve got the Cypher – you need to select the Connection to use (see the picture) – which is why naming them nicely is SUPER useful.

Once you’ve done that, hit ‘Refresh’ to get the Output Columns populated:

Job done. Good work!

Add the Destination

No surprises for guessing this involves dragging the Destination to the page.

Next, join up the Source to the Destination:

The UI for this is not as fully fledged out as the other, so unfortunately we need to head into the Advanced Editor. So Right click on it, and open the Advanced Editor:

First we want to set the connection:

Again – naming!!

Then we’re going to go to the ‘Input’ tab and select our input from the Source:

Press OK to save all that, and then double click on the Destination item and go to the Cypher Editor:

First off – you can see the ‘title’ listed in the parameters, so that’s good – Cypher wise we’re doing a MERGE- so we only get one ‘Cloud Atlas’ (because no-one needs more than one of those).

MERGE (:Movie {title: $title})

At this point, we have our two things and no red crosses or errors anywhere, so let’s run it!

Run it!

No surprises – we press ‘Start’ and get the ‘liney’ version of the page which hopefully you see as:

38 rows (hahaha Rows!) and if you go to your ‘Destination’ database you should see the movies there.

I want it

Of course you do – these controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Using a Data Flow to move data from who knows where to Neo4j in SSIS

In what is rapidly becoming a series of posts – we look into another of the components in the Anabranch SSIS Components for Neo4j package. The last post looked at using the “Execute Cypher Task” from within a Control Flow, but that’s not so useful, I mean – it’s great for doing things like Deleting a DB, adding indexes etc, but when we want to get Data from one source to another, we gotta go all Data Flowy.

I’m working on the principle that you’ve gone through the last post, as well, I’m going to pick up from where we left off, and I make no apologies for my assumptions.

Clear the DB

I should mention – please check which DB instance you are connected to – nothing says ‘problem’ quite like deleting your production database.

Let’s first clear the Neo4j instance back to an empty state, run:

MATCH (n) DETACH DELETE n

In the browser.

Clear Package

We don’t want the Execute Cypher Task any more, so select it – and press Delete, or go all Mousey and right-click – the choice is yours

Deleting the mouse way

Let’s Data Flow (Task)!

Drag a Data Flow Task onto the Control Flow workspace:

Double click on the Task to be taken to the Data Flow workspace, which will be empty. So let’s drag a ‘Flat File Source’ to the space:

Double click on the Flat File Source, and the editor will pop up. We need to add a new Connection Manager, so press ‘New…’

Now, we want to use a CSV file, you can use the one I use by downloading from this link, it’s not very exciting I’m afraid, just some names πŸ™‚ Anyhews – fill in the details that match your file (the ones in this picture match my file, the only thing I’ve changed from default is the Code page to be 65001 (UTF-8))

Then click on the ‘Columns’ bit on the left hand side, to make sure it all looks ok, and press OK. You’ll be back to the ‘Flat File Source Editor’ – and you should now click on the ‘Columns’ bit here too:

Make sure at least the First/Last names are checked here – obviously if you’re using your own file – pick your columns! Press OK and go back to the workspace.

Now drag an ‘Execute Cypher Destination’ task to the workspace:

Drag the ‘Blue arrow’ from the Flat File Source, and attach it to the Execute Cypher task:

Then, right click on the execute cypher task, and select ‘Show Advanced Editor…’

First, set the connection manager, we want to use our existing Neo4j Connection Manager

Then we want to select the ‘Input Columns’, just pick them all for now:

Press OK, and then Double click on the Execute Cypher Task, to get the Cypher Editor

Add the Cypher as I have above:

CREATE (:User {First: $FirstName, Last: $LastName})

And press OK.

Do some SSISing!

Now, all that’s left to do is press Start (or Right-click – Execute Task) whichever is your preference!

It’ll run, and give you the following:

Which you can check in your DB by running:

MATCH (n) RETURN COUNT(n)

Things are a bit more interesting now, as we’re pulling from a different source and putting into the database, obviously SSIS supports loads of sources – with

These controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Neo4j & SSIS – Connecting and executing Cypher in a Control Flow

Last Friday, Anabranch released the first beta version of it’s connector to Neo4j from SSIS. Aside from a post saying that it existed, I didn’t go into detail, so this is going to be a series of posts on how you can use your existing SSIS infrastructure with Neo4j.

Today we’re going to look at 2 parts of the connector, the Neo4j Connection Manager (CM) and the Execute Cypher Task (ECT). The CM is fundamental to all the controls, without it, you can’t connect to the database. I’ll go into what it does, settings etc in another post, but for now – it’s enough to know that it provides the connection. The ECT allows us to execute Cypher against a given connection manager.

** NOTE **
In version 1.0.0(beta) – the ECT will only work with the first CM you add to the package

This video covers the same topic as the text version below:

I’m going to develop this in Visual Studio 2017, at the time of writing – I found the 2019 SSIS packages to be a bit flakey, whereas the 2017 has been sturdy so far – from a ‘demo’ point of view though – the 2019 process is exactly the same after you have it all installed.

SETUP
If you’ve never developed against SSIS before, you’ll need a couple of things, firstly SSDT (specifically the Integration services bits), Visual Studio – I think the community edition should work, but I can’t confirm. You’ll also need the Anabranch Ssis Controls for Neo4j – assuming you’ve registered ( http://anabranch.co.uk/Projects/Neo4jSsis) and have the download link, you’ll want the 2017 x86 version of the controls – (for VS2019 as well!).

Download and install the controls. NB. You want to install these when Visual Studio isn’t running – as we’re in the heady world of the GAC here, and VS won’t find them unless it’s started with them there.

Do do this example yourself – you’ll also need a Neo4j database instance running, I’d recommend using the Neo4j Desktop as it makes it easier to manage the process.

Create your first package

1. Start up Visual Studio
2. Create a new Integration Services project

New Project…

3. In the new Package.dtsx file, we need to add a Connection Manager. Right click on the bottom ‘Connection Managers’ bar and add a Neo4j connection – if you don’t see it – you might have to restart Visual Studio, or possibly your machine.

Then select the Neo4j Connection:

You’ll now see it in the ‘Connection Managers’ section:

Select it – and change the connection properties to ones that match your database instance – at the moment this is done via the properties window:

At this stage, we have a connection – but we’re not using it, so let’s add a task to execute:

Drag the ‘Execute Cypher Task’ to the Control Flow, and double click on it. Then add the following Cypher:

CREATE (:Node {Id:1})

Press OK

Then we can execute the task:

Once that’s done:

If we go to our Neo4j Database, we can run:

MATCH (n:Node) RETURN n

If we look at the ‘Id’ property – we can see it is ‘1’

So. Now we have an SSIS integration package executing against a Neo4j database.

These controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Neo4j & SSIS

Neo4j and SSIS are awkward bedfellows – SSIS is Microsoft and has connectors to a plethora of database and technologies using ODBC, Web etc, and Neo4j is written in Java which provides a JDBC connection. SSIS however, does not work with JDBC.

#badtimes

Some of the clients I’ve worked with like using SSIS – (some don’t), and value their 20+ years of using a piece of technology, and want to leverage it with new technologies. Nothing says expensive like having to learn a new database and a new ETL tool.

So today I’d like to introduce you to the beta (maybe alpha) version of the Neo4j Connector for SSIS. It uses bolt to securely connect to your Neo4j instance and call Cypher against it.

Version 1 beta features:

  • Neo4j Connection Manager – manages the connection to the database, and securely encrypts your password (and that is actual encryption) making it safe for you to store.
  • Execute Cypher Task – Allows you to execute a piece of Cypher on a Neo4j instance as part of a Control Flow
  • Execute Cypher Destination – Allows you to execute Cypher against a Neo4j instance as part of a Data Flow
  • Both the above pictures show the basic syntax highlighting as well
  • Works with SSIS 2016, 2017 and 2019 (CTP 3)

Do you want to try it? You need the appropriate installer, there are 6 flavours (6!!), if you’re installing on a Server – you’ll probably want the x64 version of the Server version. What? I know – not that clear. If you’re installing on a SQL Server 2016 instance, use the SQL 2016 x64 installer.

To use on a local designer (VS 2017 or 2019) you’ll want the x86 SQL 2017 version – As the integration services addin for VS 2019 still uses the 2017 install locations.

Please go to here: http://anabranch.co.uk/Projects/Neo4jSsis to get a link via email!

Give me feedback and I’ll put more posts up on how to use some of the features shortly!

Actually using the new DataConnector for PowerBI

After I’d written it – I realised my last post was perhaps not the most useful for those who really don’t care about the how but want to know what to do to use it. So this will follow the same deal as with the last post (over a year and a half ago!! WOW!).

Video version below if you want it:

The Setup Steps

First – we’ve got to install PowerBI – now, I didn’t sign up for an account, but downloaded it from the PowerBI website, and installing was simple and quick.

We also need to have Neo4j running, and you can use Community or Enterprise, it matters not – and we’ll want to put the β€˜Movies’ dataset in there, so run your instance, and execute:

:play movies

Add the Data Connector to Power BI

1. First – download the connector from the releases page (or build it yourself in VS) – you want the `Neo4j.mez` file.

Version 1: https://github.com/cskardon/Neo4jDataConnectorForPowerBi/releases/tag/1.0.0

2. PowerBI looks for custom connectors in the <USER>\Documents\Power BI Desktop\Custom Connectors folder, which if it doesn’t exist – you’ll need to create. Once you have that folder, copy the Neo4j.mez connector there.

image

3. Nearly there – we just need to allow PowerBI to load the connector now – so, start up PowerBI and go to the Options dialog:

image

Once there, select the β€˜Security’ option, and then under the Data Extensions header select the option allowing you to load any extension without validation or warning:

image

You’ll have to restart PowerBI to get the connector to be picked up – so go ahead and do that now!

Lets Get Some Data!

Now – I know a lot of you will have been excited by the Pie Chart from the last post – now you can create your own!

With a new instance of PowerBI running, let’s select β€˜Get Data’

image

We can now either search for ‘Neo4j’ or look in the ‘Database’ types:

Select Neo4jthen press ‘Connect’ – aaaaand a warning!

Read it, ignore it – it’s up to you – but this is just to let you know that it’s still in Beta (I mean it’s only had one release so far!) Continue if you’re happy to.

Now you’re given the boxes to enter your Cypher and connection information – the text box for the Cypher field is a single line (ugh) – so if it’s a complicated query – you’re probably best of writing it in Sublime or similar (maybe even Notepad!?!). In this case, we can go simple:

MATCH (m:Movie) RETURN m;

Now, the other settings, if you’re running default settings – you can leave these, but obviously if you need to connect to https instead of http change it in the scheme setting. I’ve filled in my display with the defaults:

When you press OK, you get the Login dialog, ifΒ you are anonymously connecting, then select Anonymous, else – fill in your username / password.

Press ‘Connect’, and PowerBI will connect to your DB and return you back a list of ‘Record’:

We’ll want to ‘Edit’ this, so press ‘Edit’!

When the Power Query Editor opens press the expand column button at the top of the ‘m’ column:

Ooooooh, ‘tagline’, ‘title’ and ‘released’ — our movie properties! For this, I would turn off the ‘use original column name as prefix’ checkbox, leave them all selected and press OK.

Data!!

Now, let’s ‘Close & Apply’ our query – NB – if you look in the ‘Applied Steps’ section, you can see we only have 2 steps, ‘Source’ and ‘Expanded m’

Whilst PowerBI applies it just think of the Pie charts that lie ahead of us:

When that dialog disappears, we’re good to go! On the right you’ll see a ‘Fields’ section, and you should see something like this:

So, the moment we’ve all waited for…

Let’s Pie Chart

Select ‘Pie Chart’ from the Visualizations section:

Once it’s in your display – select it and drag the ‘released’ field from the Query1 to the ‘details’ field, and then title to the ‘values’ field:

Your chart should look something like this now:

So let’s max size it, and mouse over it, now we can see:

But we can also drag ‘title’ on to the Tooltips field like so to get the First (or last) movie in that group:

What does a query look like under the covers?

Some of us like visual, some like code, the last time we tried this – our query was 20 lines long – our new query though – that’s just 5!

let
    Source = Neo4j.ExecuteCypher("MATCH (m:Movie) RETURN m;", "http", "localhost", 7474),
    #"Expanded m" = Table.ExpandRecordColumn(Source, "m", {"tagline", "title", "released"}, {"tagline", "title", "released"})
in
    #"Expanded m"

Which is much nicer.

Hey hey! It’s Beta!

The Data Connector approach gives a much nicer way to query the database, it strips out a lot of the code we have to write, and hopefully makes the querying easier.

BUT – I am not a PowerBI expert – is this the right way to do this? Are there improvements? Some hardcoded queries we should have there? Let me know – do a PR – it’s all good!

PowerBI With Neo4j – How do you build a DataConnector?

Pie Chart of Movies

TL;DR;

Repo is at: https://github.com/cskardon/Neo4jDataConnectorForPowerBi
Release at: https://github.com/cskardon/Neo4jDataConnectorForPowerBi/releases

Looky! Pie Charts!

Pie Chart of Movies

This glorious picture represents the very pinnacle of my PowerBI experience, beforehand I was pulling the data into Excel and charting myself – no longer!

Jokes aside, the big news here is that I’ve dramatically improved upon my previous post where I showed how you could connect to a security enabled Neo4j instance from PowerBI by generating your own base64 encoded string. All in all, that’s a terrible approach, sure – it works, but it’s not really manageable for any real use.

Writing a Power BI data connector.

There are a few guides on this, I found the Microsoft repo on github for Data Connectors to be super handy. In essence you write them in  ‘M Power Query’ which is Power BI’s query language of choice. I opted to write my connector in Visual Studio – so went and got the Power Query SDK extension.

The nice thing about this is that it allows me to test my connector without needing to constantly start/stop PowerBI. So! We get that installed and create a new Data Connector project:

New Project

This gets you a new Data Connector project with two files that you’ll initially care about, a .pq file and a .query.pq file. The latter being a ‘unit test’ file. Let’s first look at the .pqfile.

.PQ

A .pqfile is simply a PowerQuery file, it’s written in M and if you’re a PowerBI specialist – I assume that’s all good – for a non-PowerBI user (me) it means learning some stuff.

So, if you just F5 the project you should get a swirly thing, followed by an error saying credentials are needed.

Select ‘Anonymous’

Then press ‘Set Credential’ – then press F5 again – results!

OK, so what did we actually run when we pressed F5? Remember the .query.pqfile? That is executing the Contents() query on the default connector.

let
    result = PQExtension1.Contents()
in
    result

OK, so far so – hum drum. This is really to get you used to the Power BI development experience. The good news is that we can just copy / paste from the old post I did and we can have a working function – taking into account that (a) we have the same data (movies DB) and the same user pass (neo4j/neo).

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = () =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                Headers=[Authorization="Basic bmVvNGo6bmVv"],
                Content=Text.ToBinary("{""statements"" : [ {
                        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

You should get results saying: [Record]which is what you have, if you get that – you have successfully connected to Neo4j! Good job! Now, first things first, let’s strip out the Authorization header and auto generate that.

Authorization

We have two forms, anonymous and user/pass. For anonymous, we don’t want to send the header, for user/pass – we obviously do. Let’s start with user/pass as we’re already there.

So let’s add another function, it’ll generate the headers for us, let’s firstly hardcode it:

DefaultRequestHeaders = [
    #"Authorization" = "Basic " & Binary.ToText("neo4j:neo")
];

Changing our function to be:

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = () =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                //Change HERE vvvvv
                Headers=DefaultRequestHeaders,
                Content=Text.ToBinary("{""statements"" : [ {
                        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

Pressing F5 will connect and get the same result. So – we know we can do the base64 conversion in PowerBI – this is good. But, still – not ideal to have usernames and passwords hardcoded – for someΒ reason. So let’s let PowerBI get us a user/pass.

Navigate to the ‘Data Source Kind description’ section and UsernamePassword there:

// Data Source Kind description
PQExtension1 = [
    Authentication = [
        // Key = [],
        UsernamePassword = [],
        // Windows = [],
        Implicit = []
    ],
    Label = Extension.LoadString("DataSourceLabel")
];

You can add things like ‘UsernameLabel’ in there if you want (I have, but for the purposes of this – I’m not gonna bother) – they make it look pretty for the PowerBI people out there πŸ™‚

OK, when you connect now (you might have to delete your credentials – the easiest way being selecting the ‘Credentials’ tab in the M Query Output window and ‘Delete Credential’) you will be able to select User/Pass as an option

But hey! We’re not actually using it yet! So let’s get the values from PowerBI using a handy function (that is really hard to find out about) called: Extension.CurrentCredential() with which we can get the username / password, so let’s update our DefaultRequestHeaders to use it:

DefaultRequestHeaders = [
    #"Authorization" = "Basic " & Neo4j.Encode(Extension.CurrentCredential()[Username], Extension.CurrentCredential()[Password])
];

OK, the dream is alive! For anonymous, basically we want to remove the headers, to do that we need to check what type of authentication is in use, and we’re back to the hilariously undocumented Extension.CurrentCredential method again:

Headers = if Extension.CurrentCredential()[AuthenticationKind] = "Implicit" then null else  DefaultRequestHeaders,

We look for ‘Implicit’ as that’s what Anonymous is – with this we set the Headers to null if we’re anonymous, and the headers if not – ACE!

Getting Stuff

The crux of the whole operation, now we’re able to connect with user/pass and anonymous, it’s probably time we dealt with the hardcoded Cypher. Let’s take in a parameter to our method:

[DataSource.Kind="PQExtension1", Publish="PQExtension1.Publish"]
shared PQExtension1.Contents = (cypher as text) =>
let
    Source = 
        Json.Document(
            Web.Contents("http://localhost:7474/db/data/transaction/commit",
            [
                Headers=DefaultRequestHeaders,
                Content=Text.ToBinary("{""statements"" : [ {
                                //Change HERE vvvvv
                        ""statement"" : "" " & cypher & " ""} ]
                        }")
            ])),
    results = Source[results]
in
    results;

Excellent, now we need to change our query.pqfile to call it:

let 
	result = PQExtension1.Contents("MATCH (n) RETURN COUNT(n)")
in
	result

F5 and see what happens – now we’re passing Cypher to the instance, and getting back results.

This largely covers how to build your own connector, if you look in the source code (and I encourage you to – it’s only 156 lines long including comments) – in it you’ll see I’ve abstracted out some of the stuff we’ve done here, named things properly, and I also pull in the address, port and scheme to allow a user to set it.

Better Know APOC #4 : apoc.coll.sort*

Neo4j.Version 3.3.4
APOC Version 3.3.0.2

If you haven’t already, please have a look at theΒ introΒ post to this series, it’ll cover stuff which I’m not going to go into on this.


OK, ‘apoc.coll’ has 43 (that’s right – 43) functions and procedures, but I’m only going to cover the ‘sort’ ones for this post – why? Because a post containing 43 different functions – whilst a good % of the overall, would be way too long.

As it is, with ‘sort’ we have 4 functions:

  • apoc.coll.sort
  • apoc.coll.sortMaps
  • apoc.coll.sortMulti
  • apoc.coll.sortNodes

The Whys

These are methods to sort collections, the clue is in the name, but why do we need them? We can sort in Cypher right? We have ‘ORDER BY‘, who wrote this extra bit of code that has no use?? Who?!?!

When Hunger strikes, run for cover

Hunger. Hmmm given his pedigree we may have to assume this was done for a reason… Let’s explore that a bit with the apoc.coll.sort method…

apoc.coll.sort

This is your basic sort method, given a collection, return it sorted. It doesn’t matter what type the collection is, it will sort it.

Parameters

Just the one for in and one for out, the in is the collection to sort, the out is the sorted collection.

Examples

We’ll look (for this case) at doing it the traditional Cypher way, and then the APOC way.

The Cypher way

It’s worth seeing the Cypher way so you can appreciate sort, this is based on this question on Stack Overflow.

We’ll have a collection which is defined as such:

WITH [2,3,6,5,1,4] AS collection

Let’s sort this the Cypher way – easy!

WITH [2,3,6,5,1,4] AS collection
 RETURN collection ORDER BY ????

Errr, ok, looks like we’re gonna need to tap into some unwinding!

WITH [2,3,6,5,1,4] AS collection
UNWIND collection AS item
WITH item ORDER BY item
RETURN collect(item) AS sorted

that’s got it! So we UNWIND the collection, then WITH each item (ORDER BY) we then COLLECT them back again.

The APOC way

WITH [2,3,6,5,1,4] AS collection
RETURN apoc.coll.sort(collection) AS sorted

That’s aΒ lot easier to read, it’s also a lot easier to use inline. The Cypher version above might look ok, but imagine you have a more complicated query, and you need to either do multiple sorts, or even just anything extra, it can quickly become unwieldy.

apoc.coll.sortMaps

A Map (or Dictionary for those .NETters out there) is the sort of thing we return from Neo4j all the time, and this function allows us to sort on a given property of a Map.

Examples

For these examples, we’ll have ‘coll’ defined as:

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll

An array of maps, with an ‘Str‘ property, and a ‘Num‘ property.

Sort by string

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll
RETURN apoc.coll.sortMaps(coll, 'Str')

Returns us a list of the maps, looking like:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'Str')"                                     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Str":"A","Num":4},{"Str":"B","Num":3},{"Str":"C","Num":2},{"Str":"Dβ”‚
β”‚","Num":1}]                                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

In which we can see the maps go from ‘A’ to ‘D’

Sort by Number

WITH [{Str:'A', Num:4}, {Str:'B', Num:3}, {Str:'D', Num:1}, {Str:'C', Num:2}] AS coll
RETURN apoc.coll.sortMaps(coll, 'Str')

Unsurprisingly, this gets us the following:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'Num')"                                     β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Str":"D","Num":1},{"Str":"C","Num":2},{"Str":"B","Num":3},{"Str":"Aβ”‚
β”‚","Num":4}]                                                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Which goes from 1 to 4.

Sort order isΒ Ascending,Β there is no way to do a descending sort. You basically do a ‘reverse’ to get the sort the other way.

apoc.coll.sortMulti

This is the equivalent of doing a ‘Sort, Then By’ – so if I take the ‘sortMaps’ function above and run it like so:

WITH [{First:'B', Last:'B'}, {First:'A', Last:'A'}, {First:'B', Last:'A'}, {First:'C', Last:'A'}] AS coll
RETURN apoc.coll.sortMaps(coll, 'First')

I get:

╒══════════════════════════════════════════════════════════════════════╕
β”‚"apoc.coll.sortMaps(coll, 'First')"                                   β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•‘
β”‚[{"Last":"A","First":"A"},{"Last":"B","First":"B"},{"Last":"A","First"β”‚
β”‚:"B"},{"Last":"A","First":"C"}]                                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The problem here is the two elements:

{"Last":"B","First":"B"},{"Last":"A","First":"B"}

I want these to be the other way around, so I have to switch to ‘Multi’:

WITH [{First:'B', Last:'B'}, {First:'A', Last:'A'}, {First:'B', Last:'A'}, {First:'C', Last:'A'}] AS coll
UNWIND apoc.coll.sortMulti(coll, ['^First', '^Last']) AS unwound
RETURN unwound.First AS first, unwound.Last AS last

This get’s me:

╒═══════╀══════╕
β”‚"first"β”‚"last"β”‚
β•žβ•β•β•β•β•β•β•β•ͺ══════║
β”‚"A"    β”‚  "A" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"B"    β”‚  "A" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"B"    β”‚  "B" β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€
β”‚"C"    β”‚  "A" β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜

One this to note here – (and I thinkΒ it’s quite important) is that this is theΒ only method thatΒ defaults toΒ Descending order. To get Ascending search, you have to prefix columns with a ‘^’ character (as I’ve done in this case).

apoc.coll.sortNodes

Nearly there! This takes a collection ofΒ nodes and sorts them onΒ 1 property – so, let’s add some nodes:

CREATE (n1:CollNode {col1: 1, col2: 'D'})
CREATE (n2:CollNode {col1: 2, col2: 'C'})
CREATE (n3:CollNode {col1: 3, col2: 'B'})
CREATE (n4:CollNode {col1: 4, col2: 'A'})

And let’s do a sort:

MATCH (n:CollNode)
WITH apoc.coll.sortNodes(COLLECT(n), 'col2') AS sorted
UNWIND sorted AS n
RETURN n.col1 AS col1, n.col2 AS col2

Now, you could argue this adds little to the party as you can already ORDER BY, and by and large you’re right – the nice thing about the apoc version is that you can call it as I have above, rather than having to do the sort afterwards. Having said that, ORDER BY does have a DESC keyword as well, which sortNodes does not :/

Conclusions

apoc.coll.sort* is useful, that’s the main thrust, some are more useful than others, and there are a few omissions (like the ability to sort desc for all but the sortMulti method) which could be good simple pull requests.

They are what they are, sorting methods πŸ™‚

Neo4j with Azure Functions

Recently, I’ve had a couple of people ask me how to use Neo4j with Azure functions, and well – I’d not done it myself, but now I have – let’s get it done!

  1. Login to your Azure Portal

  2. CreateΒ  a new Resource, and search for β€˜function app’:

image

  1. Select β€˜Function App’ from the Market Place:

image

  1. Press β€˜Create’ to actually make one:

image

  1. Fill in your details as you want them

image

I’m assuming you’re reasonably au fait with the setting here, in essence if you have a Resource Group you want to put it into (maybe something with a VNet) then go for it, in my case, I’ve just created a new instance of everything.

  1. Create the function, and wait for it to be ready. Mayhaps make a tea or coffee, have a break from the computer for a couple of mins – it’s all good!

image

  1. When it’s ready, click on it and go into the Function App itself (if it doesn’t take you there!)

  2. Create a new function:

image

  1. We want to create an HttpTrigger function in C# for this instance:

image

  1. This gives us a β€˜run.csx’ file, which will have a load of default code, you can run it if you want,

image

and you’ll see an output window appear which will say:

image

Well – good – Azure Functions work, so let’s get a connection to a Neo4j instance – now – for this I’m assuming you have an IP to connect to – you can always use the free tier on GrapheneDB if you want to play around with this.

  1. Add references to a driver

We need to add a reference to a Neo4j client, in this case, I’ll show the official driver, but it will work as well with the community driver. First off, we need to add a β€˜project.json’ file, so press β€˜View Files’ on the left hand side –

image

Then add a file:

image

Then call it project.json – and yes it has to be that name:

image

With our new empty file, we need to paste in the nuget reference we need:

{
Β Β  "frameworks": {
Β Β Β Β  "net46":{
Β Β Β Β Β Β  "dependencies": {
Β Β Β Β Β Β Β Β  "neo4j.driver": "1.5.2"
Β Β Β Β Β Β  }
Β Β Β Β  }
Β Β Β  }
}

Annoyingly if you copy/paste this into the webpage, the function will add extra β€˜closing’ curly braces, so just delete those.

image

If you press β€˜Save and Run’ you should get the same response as before – which is good as it means that the Neo4j.Driver package has been installed, if we look at files, we’ll see the β€˜project.json.lock’ file which we want to.

image

  1. Code

We want to add our connection information now, we’re going to go basic, and just return the COUNT of the nodes in our DB. First we need to add a β€˜using’ statement to our code:

So add,

using Neo4j.Driver.V1;

Then replace the code in the function with:

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
Β Β Β Β  using (var driver = GraphDatabase.Driver("bolt://YOURIP:7687", AuthTokens.Basic("user", "pass")))
Β Β Β Β  {
Β Β Β Β Β Β Β Β  using (var session = driver.Session())
Β Β Β Β Β Β Β Β  {
Β Β Β Β Β Β Β Β Β Β Β Β  IRecord record = session.Run("MATCH (n) RETURN COUNT(n)").Single();
Β Β Β Β Β Β Β Β Β Β Β Β  int count = record["COUNT(n)"].As<int>();
Β Β Β Β Β Β Β Β Β Β Β Β  return req.CreateResponse(HttpStatusCode.OK, "Count: " + count);Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β Β  
Β Β Β Β Β Β Β Β  }
Β Β Β Β  }
}

Basically, we’ll create a Driver, open a session and then return a 200 with the count!

  1. Run

You can now β€˜Save and Run’ and your output window should now tell you the count:

image

  1. Done

Your first function using Neo4j, Yay!

Better Know APOC #3 : apoc.date.parse & format

 

Neo4j.Version 3.0.0
APOC Version 3.3.0.1

If you haven’t already, please have a look at the intro post to this series, it’ll cover stuff which I’m not going to go into on this.


Dates! For some reason people keep on wanting to keep track of dates (and indeed times) in their systems. Neo4j for a long time was leading the charge in rejecting your modern concepts of date and time, unfortunately – many people still want to use them, and much like the ‘paperless office’ a ‘DateTimeless office’ doesn’t really seem like it’ll be on the cards any time soon.

For a lot of people this meant hand rolling code to cope with the progress Neo4j had foisted upon them. I for example, have taken to storing DateTimes in ‘Tick’ form, because nothing says readable like ‘636812928000000000‘ (that’s Christmas day, 2018 btw). Let’s imagine code wise I want to display that – in C# it’s a doddle (and I assume Java) I just create a new DateTime and output it to the screen, but what if I wanted to do this using Cypher? Or indeed want to add stuff to my database without having to write an application to do that?

Introducing apoc.date.parse and apoc.date.format

I’m covering both of these as they areΒ complimentary, and typically if you want to use one, you’ll probably want to use the other.

What do they do?

parse‘ parses a given date time string (something like ‘2018/12/25 01:02:03‘), ‘format‘ takes a ‘parsed’ value and converts it to a string.

Setup – Neo4j.conf

Nothing! These areΒ safe and require no conf changes to use.

apoc.date.parse

We’ll look at parse first, as typically this is where most people do – gotta get that data in!

Ins and Outs

Tapping apoc.help('apoc.date.parse')

Inputs (time :: STRING?, unit = ms :: STRING?, format = yyyy-MM-dd HH:mm:ss :: STRING?, timezone = :: STRING?)
Outputs (INTEGER?)

Inputs

There’s always an input, and parse is no different!

Time

This is our date and time string (badly named), you can pass in just a date:

'2018/3/20'

Using ‘/’, or ‘-‘ separators, it’s all good:

'2018-03-20'

You can add a time:

'2018-03-20 13:34.12'

All you need to do is ensure the pattern you use is reflected in the ‘format’. The default format is listed below.

Unit

This is your output unit, default is millisecond (ms), the values you can convert to are:

  • Millisecond (ms/milli/millis/milliseconds)
  • Second (s/second/seconds)
  • Minutes (m/minute/minutes)
  • Hours (h/hour/hours)
  • Days (d/day/days)

So you can pass in ‘ms’ or ‘milli’ and get the same output.

Format

Default wise – we’re looking at:Β yyyy-MM-dd HH:mm:ss which is:

  • Full year – all the digits, ’18’ will be treated as the year ’18’, not 2018!
  • Month – 1 or 2 digits, i.e. 1 = January or 01 = January, fancy.
  • Day – Again, 1 or 2 digits
  • Hours – in 24 hour format, so if you want 1pm, that’s 13,
  • Minutes – 1 or 2 digits – I don’t think I need tell you the number of minutes in an hour (right??)
  • Seconds – 1 or 2 digits – and again, the traditional number of seconds in a minute.

But wait! That’s not all – do you want to provide your own format? Not interested in time?Β Only interested in time? Of course! Just put in your own format string (in the Java format) –

  • Just Date: 'yyyy-MM-dd'
  • Just Time: 'HH:mm:ss'

Just for clarification – the capitalisation of the ‘M’Β is important, lowercase = minutes, upper case = months.

Timezone

You’ve got 3 options here,Β  the full name of the timezone, the abbreviation, or something depicting the hours difference:

  • Full name:
    Europe/London (I can only assume we’ll need to get this to be renamed to something like ‘Brigreatain/London‘ or similar – I’ve put the ‘great’ back into Britain) – A full list of these are available on the great wikipedia.
  • Abbreviation
    PST, UTC, GMT obviously these are generally more broad strokes than a specific country.
  • Custom
    GMT +8:00
    GMT -8:00

Generally it’s recommended to use the full name. If you choose toΒ not pass in a timezone, the default is "", now, you might well ask yourself –

OK does that mean we’re looking at the timezone of my machine? The machine Neo4j is running on? Actually – what does it mean?!

Well – from the code we can see that it’s UTC, so that’s that cleared up.

Output

Your converted value – or an error (Ha!).

apoc.date.format

We’re gonna jump straight into ‘format’ – as examples wise we may as well put the two together in a date field sandwich.

Ins and Outs

So as to not break with tradition, let’s hit ‘help’: apoc.help('apoc.date.format')

Inputs (time :: INTEGER?, unit = ms :: STRING?, format = yyyy-MM-dd HH:mm:ss :: STRING?, timezone = :: STRING?)
Outputs (STRING?)

Inputs

Despite looking like I just copy/pasted from above, you’ll note a key difference, ‘time’ is now an integer. Exciting!

Time

This is our date and time inΒ Unit format. By that I mean if you’re wanting to convert from milliseconds to readable, you can do that, or even seconds to readable, all you have to do is set the…

Unit

These are the same units as above, so I won’t go over them again. Default wise we’re looking at ‘ms‘.

Format

YourΒ output format this time. So even if you stored right down to the millisecond and want to see only the year, you can do that. The default is the same as with Parse – ‘yyyy-MM-dd HH:mm:ss

Timezone

Did you store as UTC but want to see this in PST? Go for your life! As before the default is UTC.

Output

Your input in a nice readable string format. This is one of those functions we know Michael didn’t write – as he understands ticksΒ natively. Seriously – when you next meet him, ask the time – you’ll get a LONG in response, it is a thing of wonder.

Examples

The obligatory examples section

The Basics (look Mum! No (optional) params!)

We always need to provide at leastΒ a date/time to be able to parse, so:

RETURN apoc.date.parse('2018-03-31 13:14:15')

Gets you:

1522502055000

So let’s pass that back into our format function:

WITH apoc.date.parse('2018-03-31 13:14:15') AS inny
 WITH apoc.date.format(inny) AS outy, inny
 RETURN *
inny outy
1522502055000 “2018-03-31 13:14:15”

This is the way we’ll proceed with the examples from here on in..

The I want to only see dates example

WITH apoc.date.parse('2018-03-31', 'ms', 'yyyy-MM-dd') AS inny
 WITH apoc.date.format(inny) AS outy, inny
 RETURN *
inny outy
1522454400000 “2018-03-31 00:00:00”

Note, I parsedΒ only the date, but returned the dateΒ and time, this is just to prove that the time isn’t parsed, or ratherΒ is but set to 00:00:00.

I’m not going to go into just time, I think we can all work that out.

The Timezone Fun Example

(Loosest sense of the word ‘fun’ here) – we pass in a full date with time, using the default of UTC, then convert is back to a PST time, KERRRRAZY.

WITH apoc.date.parse('2018-03-31 13:14:15') AS inny
 WITH apoc.date.format(inny, 'ms', 'yyyy-MM-dd HH:mm:ss', 'PST') AS outy, inny
 RETURN *
inny outy
1522502055000 “2018-03-31 06:14:15”

Summing up the experience

The DateTime conversion stuff is something that is useful with queries, you can pair it with ‘timestamp()’ if you want to add things like a ‘last logged in’ date:

MATCH (u:User {Id: '123'})Β 
SET 
Β  Β  u.LastLoggedIn = timestamp(), 
Β  Β  u.LastLoggedInReadable = apoc.date.format(timestamp())

Storing date and times as integers makes querying for them easy, and apoc.date.parse makes the query readable:

//Find users who logged in this year
MATCH (u:User)Β 
WHERE u.LastLoggedIn > apoc.date.parse('2018-01-01', 'ms', 'yyyy-MM-dd)
RETURN u

I personally think that’s better than:

MATCH (u:User)
WHERE u.LastLoggedIn >Β 1514764800000
RETURN u

Anyhews, enough!

Neo4jClient turns 3.0

Well, version wise anyhow!

This is a pretty big release and is one I’ve been working on for a while now (sorry!). Version 3.0 of the client finally adds support for Bolt. When Neo4j released version 3.0 of their database, they added a new binary protocol called Bolt designed to be faster and easier on the ol’ network traffic.

For all versions of Neo4jClient prior to 3.x you could only access the DB via the REST protocol (side effect – you also had to use the client to access any version of Neo4j prior to 3.x).

I’ve tried my hardest to minimise the disruption that could happen, and I’m pretty happy to have it down to mainly 1 line change (assuming you’re passing an IGraphClient around – you are right??)

So without further ado:

var client = new GraphClient(new Uri("http://localhost:7474/db/data"), "user", "pass");`

becomes:

var client = new BoltGraphClient(new Uri("bolt://localhost:7687"), "user", "pass");

I’ve gone through a lot of trials with various objects, and I know others have as well (thank you!) but I’m sure there will still be errors – nothing is bug free! So raise an issue – ask on StackOverflow or Twitter πŸ™‚

If you’re using `PathResults` you will need to swap to `BoltPathResults` – the format changed dramatically between the REST version and the Bolt version – and there’s not a lot I can do about it I’m afraid!