Execute Cypher Task Updates

Last week, Anabranch released version 1.1 of the tools for Neo4j – which included a very welcome addition to the toolset – being able to pull data from a Neo4j instance.

After doing the demo post – I noticed a peculiarity – a quirk if you will with how the ‘Execute Cypher Task’ (see here) worked with multiple Neo4j Connection Managers defined – it would execute the Cypher against all the Connection Managers, not any specific one. This makes sense in some form – as a Control Flow Task doesn’t have a ‘Connection’ selector unlike a Data Flow Task.

Version 1.2 of the tools fixes this and tidies up the Execute Cypher Task to have a better user interface as well.

Adding Cypher

The Cypher box still has highlighting, but now lines up to the edges properly.

Choosing A Connection

You can choose from a drop down which connection you want to use. You will only see Neo4j Connection Managers here. The name of the Connection Manager will be the one you set it to.

Validation

You will get a red cross on your task if you’re missing things (in this case the connection). If you look in the ‘Error List’ you will be able to see all the errors:

Getting 1.2

To get version 1.2, please visit: http://bit.ly/neo4jssis register and you’ll be sent out the download link. Registration is only used to let you know of updates to the tools, no marketing!

Using a Data Flow to move data **from** Neo4j in SSIS

That’s right everyone! We’re going from Neo4j this time, and this is a new release, the old version (1.0.0.0) didn’t have a ‘Neo4j as a Source’ component, 1.1.0.0 does.

In the last post we took data from a file and ingested it into Neo4j, so far so good – but one of the things we were missing was the ability to also pull from Neo4j, now the circle is complete, and in this post – I’m going to show you how to pull from one Neo4j instance into another. That’s right – Neo4j to Neo4j!

As always – the video below shows the moving version of this post – but not everyone wants that.

The Setup

OK, more complex than normal, as we need multiple instances of Neo4j running – and whilst that’s not rocket science – it is more complex than normal. I don’t want to go into it particularly – but hey! I would run one DB in Neo4j Desktop, then download one of the server editions (for this Community will be just fine!)

Ports!

You need to change the ports on your new server, as the Desktop ones will be using 7474/7687 etc – So open up the neo4j.conf file and change the following settings:

These are the ports I’m using – but go crazy and pick whatever you want – it’s your database after all. Aaaanyhews – I’m going to assume you know how to start your server version of the database. If not – there’s loads of stuff online about how to do it – and if it becomes clear that we’re in a world of pain here – I’ll write one 🙂

Clear the DBs

WARNING!!! – which I don’t think we need – but here you go – make sure you know which DB you are doing this on! Don’t delete your production DB by mistake!!! (•_•)

On both the DBs we’re going to clear them, and add the ‘Movies’ demo set to one of them so – open up your browser window to both instances (http://localhost:7474 and http://localhost:7676 in my case) and execute:

MATCH (n) DETACH DELETE n

Then in one of the databases (and I will be using my 7474 database) execute:

:play movies

And step to the second step and put the movie data into your database. You can check the data is all there by running:

MATCH (n) RETURN COUNT(n)

You should get 171 nodes. OK, now we’re all set up and ready to go!

Let’s SSIS

As another assumption – I’m going with the fact that you know how to start up Visual Studio and create a new Package.

Let’s first add one connection:

And, obvs pick Neo4j:

Oooh – note the ‘version’ there as well – if yours says lower than that – then bad times 🙁

Rename it to something like ‘The Source’ or whatever you find memorable:

Make sure the user / pass and server are all correct:

Looking good! Now – repeat – for the other server – remembering the port will be different – and choosing a different name, something like ‘The Destination’ for example, and you should end up with this state of affairs:

Let’s add a ‘Data Flow’ to our package now, again you can rename if you want. I did, but don’t let that force you into doing anything:

Double click on it, and we’re into Data Flow design heaven!

Add the Source

Drag the ‘Execute Cypher Source’ component from the toolbox onto the page:

Double click on it to enter the ‘Edit’ page:

The Cypher we’re going to execute is:

MATCH (m:Movie) RETURN m.title AS title

Now – some TOP TIPS. This works best if you RETURN specific columns, SSIS doesn’t know what to do with a full node, and using the AS there makes the output columns easier to use.

Once you’ve got the Cypher – you need to select the Connection to use (see the picture) – which is why naming them nicely is SUPER useful.

Once you’ve done that, hit ‘Refresh’ to get the Output Columns populated:

Job done. Good work!

Add the Destination

No surprises for guessing this involves dragging the Destination to the page.

Next, join up the Source to the Destination:

The UI for this is not as fully fledged out as the other, so unfortunately we need to head into the Advanced Editor. So Right click on it, and open the Advanced Editor:

First we want to set the connection:

Again – naming!!

Then we’re going to go to the ‘Input’ tab and select our input from the Source:

Press OK to save all that, and then double click on the Destination item and go to the Cypher Editor:

First off – you can see the ‘title’ listed in the parameters, so that’s good – Cypher wise we’re doing a MERGE- so we only get one ‘Cloud Atlas’ (because no-one needs more than one of those).

MERGE (:Movie {title: $title})

At this point, we have our two things and no red crosses or errors anywhere, so let’s run it!

Run it!

No surprises – we press ‘Start’ and get the ‘liney’ version of the page which hopefully you see as:

38 rows (hahaha Rows!) and if you go to your ‘Destination’ database you should see the movies there.

I want it

Of course you do – these controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Using a Data Flow to move data from who knows where to Neo4j in SSIS

In what is rapidly becoming a series of posts – we look into another of the components in the Anabranch SSIS Components for Neo4j package. The last post looked at using the “Execute Cypher Task” from within a Control Flow, but that’s not so useful, I mean – it’s great for doing things like Deleting a DB, adding indexes etc, but when we want to get Data from one source to another, we gotta go all Data Flowy.

I’m working on the principle that you’ve gone through the last post, as well, I’m going to pick up from where we left off, and I make no apologies for my assumptions.

Clear the DB

I should mention – please check which DB instance you are connected to – nothing says ‘problem’ quite like deleting your production database.

Let’s first clear the Neo4j instance back to an empty state, run:

MATCH (n) DETACH DELETE n

In the browser.

Clear Package

We don’t want the Execute Cypher Task any more, so select it – and press Delete, or go all Mousey and right-click – the choice is yours

Deleting the mouse way

Let’s Data Flow (Task)!

Drag a Data Flow Task onto the Control Flow workspace:

Double click on the Task to be taken to the Data Flow workspace, which will be empty. So let’s drag a ‘Flat File Source’ to the space:

Double click on the Flat File Source, and the editor will pop up. We need to add a new Connection Manager, so press ‘New…’

Now, we want to use a CSV file, you can use the one I use by downloading from this link, it’s not very exciting I’m afraid, just some names 🙂 Anyhews – fill in the details that match your file (the ones in this picture match my file, the only thing I’ve changed from default is the Code page to be 65001 (UTF-8))

Then click on the ‘Columns’ bit on the left hand side, to make sure it all looks ok, and press OK. You’ll be back to the ‘Flat File Source Editor’ – and you should now click on the ‘Columns’ bit here too:

Make sure at least the First/Last names are checked here – obviously if you’re using your own file – pick your columns! Press OK and go back to the workspace.

Now drag an ‘Execute Cypher Destination’ task to the workspace:

Drag the ‘Blue arrow’ from the Flat File Source, and attach it to the Execute Cypher task:

Then, right click on the execute cypher task, and select ‘Show Advanced Editor…’

First, set the connection manager, we want to use our existing Neo4j Connection Manager

Then we want to select the ‘Input Columns’, just pick them all for now:

Press OK, and then Double click on the Execute Cypher Task, to get the Cypher Editor

Add the Cypher as I have above:

CREATE (:User {First: $FirstName, Last: $LastName})

And press OK.

Do some SSISing!

Now, all that’s left to do is press Start (or Right-click – Execute Task) whichever is your preference!

It’ll run, and give you the following:

Which you can check in your DB by running:

MATCH (n) RETURN COUNT(n)

Things are a bit more interesting now, as we’re pulling from a different source and putting into the database, obviously SSIS supports loads of sources – with

These controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Neo4j & SSIS – Connecting and executing Cypher in a Control Flow

Last Friday, Anabranch released the first beta version of it’s connector to Neo4j from SSIS. Aside from a post saying that it existed, I didn’t go into detail, so this is going to be a series of posts on how you can use your existing SSIS infrastructure with Neo4j.

Today we’re going to look at 2 parts of the connector, the Neo4j Connection Manager (CM) and the Execute Cypher Task (ECT). The CM is fundamental to all the controls, without it, you can’t connect to the database. I’ll go into what it does, settings etc in another post, but for now – it’s enough to know that it provides the connection. The ECT allows us to execute Cypher against a given connection manager.

** NOTE **
In version 1.0.0(beta) – the ECT will only work with the first CM you add to the package

This video covers the same topic as the text version below:

I’m going to develop this in Visual Studio 2017, at the time of writing – I found the 2019 SSIS packages to be a bit flakey, whereas the 2017 has been sturdy so far – from a ‘demo’ point of view though – the 2019 process is exactly the same after you have it all installed.

SETUP
If you’ve never developed against SSIS before, you’ll need a couple of things, firstly SSDT (specifically the Integration services bits), Visual Studio – I think the community edition should work, but I can’t confirm. You’ll also need the Anabranch Ssis Controls for Neo4j – assuming you’ve registered ( http://anabranch.co.uk/Projects/Neo4jSsis) and have the download link, you’ll want the 2017 x86 version of the controls – (for VS2019 as well!).

Download and install the controls. NB. You want to install these when Visual Studio isn’t running – as we’re in the heady world of the GAC here, and VS won’t find them unless it’s started with them there.

Do do this example yourself – you’ll also need a Neo4j database instance running, I’d recommend using the Neo4j Desktop as it makes it easier to manage the process.

Create your first package

1. Start up Visual Studio
2. Create a new Integration Services project

New Project…

3. In the new Package.dtsx file, we need to add a Connection Manager. Right click on the bottom ‘Connection Managers’ bar and add a Neo4j connection – if you don’t see it – you might have to restart Visual Studio, or possibly your machine.

Then select the Neo4j Connection:

You’ll now see it in the ‘Connection Managers’ section:

Select it – and change the connection properties to ones that match your database instance – at the moment this is done via the properties window:

At this stage, we have a connection – but we’re not using it, so let’s add a task to execute:

Drag the ‘Execute Cypher Task’ to the Control Flow, and double click on it. Then add the following Cypher:

CREATE (:Node {Id:1})

Press OK

Then we can execute the task:

Once that’s done:

If we go to our Neo4j Database, we can run:

MATCH (n:Node) RETURN n

If we look at the ‘Id’ property – we can see it is ‘1’

So. Now we have an SSIS integration package executing against a Neo4j database.

These controls are currently in an open beta, to register to get the controls, please go to: http://bit.ly/neo4jssis

Neo4j & SSIS

Neo4j and SSIS are awkward bedfellows – SSIS is Microsoft and has connectors to a plethora of database and technologies using ODBC, Web etc, and Neo4j is written in Java which provides a JDBC connection. SSIS however, does not work with JDBC.

#badtimes

Some of the clients I’ve worked with like using SSIS – (some don’t), and value their 20+ years of using a piece of technology, and want to leverage it with new technologies. Nothing says expensive like having to learn a new database and a new ETL tool.

So today I’d like to introduce you to the beta (maybe alpha) version of the Neo4j Connector for SSIS. It uses bolt to securely connect to your Neo4j instance and call Cypher against it.

Version 1 beta features:

  • Neo4j Connection Manager – manages the connection to the database, and securely encrypts your password (and that is actual encryption) making it safe for you to store.
  • Execute Cypher Task – Allows you to execute a piece of Cypher on a Neo4j instance as part of a Control Flow
  • Execute Cypher Destination – Allows you to execute Cypher against a Neo4j instance as part of a Data Flow
  • Both the above pictures show the basic syntax highlighting as well
  • Works with SSIS 2016, 2017 and 2019 (CTP 3)

Do you want to try it? You need the appropriate installer, there are 6 flavours (6!!), if you’re installing on a Server – you’ll probably want the x64 version of the Server version. What? I know – not that clear. If you’re installing on a SQL Server 2016 instance, use the SQL 2016 x64 installer.

To use on a local designer (VS 2017 or 2019) you’ll want the x86 SQL 2017 version – As the integration services addin for VS 2019 still uses the 2017 install locations.

Please go to here: http://anabranch.co.uk/Projects/Neo4jSsis to get a link via email!

Give me feedback and I’ll put more posts up on how to use some of the features shortly!