Better Know APOC #1: apoc.export.csv.*
Neo4j Version | 3.3.0 |
APOC Version | 3.3.0.1 |
If you haven’t already, please have a look at the intro post to this series, it’ll cover stuff which I’m not going to go into on this.
We’re going to start with the Export functions – in particular exporting to CSV, mainly as people have asked about it – and well – you’ve got to start somewhere.
apoc.export.csv.*
There are 4 functions documented on the GitHub.IO page:
- apoc.export.csv.query(query, file, config)
- apoc.export.csv.all(file, config)
- apoc.export.csv.data(nodes, rels, file, config)
- apoc.export.csv.graph(graph, file, config)
All of them export a given input to a CSV file, specified by the file parameter.
Setup
There’s a couple of things we need to have in place to use these methods.
Neo4j.conf
We need to let Neo4j know that you allow it to export, and also run the export csv procedures:
apoc.export.file.enabled=true dbms.security.procedures.unrestricted=apoc.export.csv.*
apoc.export.csv.query
In no particular order (aside from the docs order) we’ll look at the .query version of export.csv. This procedure takes a given query and exports is to a csv file – the format of the RETURN
statement in the query directly affects the output, so if you return nodes, you get full node detail.
From the help procedure we get the following for the signature:
Inputs | (query :: STRING?, file :: STRING?, config :: MAP?) |
Outputs | (file :: STRING?, source :: STRING?, format :: STRING?, nodes :: INTEGER?, relationships :: INTEGER?, properties :: INTEGER?, time :: INTEGER?, rows :: INTEGER?) |
Inputs
Query
I hope this is obvious – but – it’s the query you want to use to get your CSV columns – personally I write the query first, make sure it’s working then simple copy/paste into my apoc call, so let’s say I want to get all the Movies a given Person (Tom Hanks) has ACTED_IN, I would do:
MATCH (p:Person {name: ‘Tom Hanks’})-[:ACTED_IN]->(m:Movie) RETURN m.title, m.released
File
This is the filename to export to – I always go fully qualified, but should you want to go relative, it’s relative to the Neo4j home directory
Config
There is only one config setting that affects this procedure:
Parameter | Description |
d | Sets the delimiter for the export, this can only be one character, so something like ‘-‘ is ok, or ‘\t’ (for tabs) |
Outputs
File
This is what you passed in, It doesn’t give you the fully qualified location, just what you passed in.
Source
This will say something like ‘statement: cols(2)’ to indicate the query returned 2 columns, I don’t think I need to explain that the number will change depending on what you return.
Format
Always gonna be csv
Nodes
If you’re returning nodes instead of just properties, this will give you the count of nodes you are exporting.
Relationships
This returns the count of the relationships being returned.
Properties
Will be 0 if you’re just returning nodes, otherwise will be a total count of all the properties returned. So if you’re returning 2 properties, but from 12 nodes, you get 24 properties.
Time
How long the export took in milliseconds – bear in mind – this will be less than the total query time you’ll see in something like the browser, due to rendering etc
Rows
The number of rows returned by the query and put into the CSV file – directly matches the number of lines you’ll have in your CSV file.
Examples
We want to export all the Movie titles and release years for files Tom Hanks has ACTED_IN – we’ve already got that query in place (up above) so lets put it into export.csv:
CALL apoc.export.csv.query( "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN m.title, m.released", "c:/temp/exportedGraph.csv", null )
I like to put the parameters onto different lines, you can whack it all in one line if that’s your fancy! I’m passing null for the config at the moment, as there’s no change I want to make.If I run this, I will get:
We can change that and return just the ‘m’:
CALL apoc.export.csv.query( "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN m", "c:/temp/exportedGraph.csv", null )
Which gives a lot more detail – about all of the node:
OK, let’s play with the parameter, now I’m a big fan of the tab delimited format, so let’s make that happen:
CALL apoc.export.csv.query( "MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie) RETURN m.title, m.released", "c:/temp/exportedGraph.csv", {d:'\t'} )
That gets us:
apoc.export.csv.all(file, config)
A lot of the following detail is the same as for the above procedure, so this will be short (in fact the next couple as well).
Config is the same – just one parameter to play with – ‘d’ or delimiter. What we don’t have is the ‘query’ parameter anymore – now we get the entire content of the database in one go – boom!
//Tab delimiting the 'all' query CALL apoc.export.csv.all( "c:/temp/exportedGraph.csv", {d:'\t'} )
What you’ll find is that as you have no control over the format of the result, the authors of APOC have kindly pulled all the properties out, so if you run it against our Movies database, you get:
"_id","_labels","name","born","released","title","tagline","_start","_end","_type","roles","rating","summary"
as the header, and obviously all the rows are all the nodes!
But hang on.. Movie doesn’t have a born property – (at least in our DB). If we look at our DB we actually have a Movie node and a Person node, and all is dumping out everything – when you scroll down the file you’ll see the different rows have their _labels property.
(To be said in a Columbo voice) “Just one last thing”… we also get the relationship details – if you scroll down the file you’ll see rows like:
,,,,,,,"322","325","ACTED_IN","[""Neo""]","",""
These are the relationships – so you really are getting everything.
apoc.export.csv.data(nodes, rels, file, config)
OK, in this one we’re looking to pass a collection of Nodes and Relationships to be exported, but how do we get those nodes and relationships? With a query!
MATCH (m:Movie)<-[r:ACTED_IN]-(p:Person {name:'Tom Hanks'}) WITH COLLECT(m) AS movies, COLLECT(r) AS actedIn CALL apoc.export.csv.data( movies, actedIn, 'c:/temp/exportedGraph.csv', null ) YIELD file, nodes, relationships RETURN file, nodes, relationships
A couple of things to note, in line 2, we COLLECT
the m and r values so we can pass them to the APOC procedure.
You may ask “Hey Chris, why exactly are you YIELD
ing? We didn’t need to do that before”, and you’re right of course. But because we have a MATCH
query at the top the Cypher parser won’t let us end with a CALL
clause, so we need to YIELD
the result from the CALL
and then RETURN
those results as well (we’ll have to do this with the next procedure as well).
Now we’ve got the nodes and relationships exported we’ll find when we look at our CSV that it’s taken the same approach as the all procedure and picked out the properties so we’ll end up with a header like:
"_id","_labels","title","tagline","released","_start","_end","_type","roles"
Which makes a LOAD CSV call super easy later on – should that be your dream goal (cypher dreams are reserved for Michael Hunger only 🙂 ).
Config wise – it’s the same as the others, only the ‘d’ option for the delimiter.
apoc.export.csv.graph(graph, file, config)
This procedure exports a graph – simple really. But what is a graph when it’s a home? Well that leads to another APOC procedure – apoc.graph.fromDB
which we’ll not cover here, needless to say Imma going to use it:
CALL apoc.graph.fromDB('movies', {}) YIELD graph CALL apoc.export.csv.graph( graph, 'c:/temp/exportedGraph.csv', null ) YIELD file, nodes, relationships RETURN file, nodes, relationships
Say WHAT??!
OK, simples – the first line exports the whole database (I’m not doing any filtering here) into a graph identifier which I then pass into the call to export.
This exports in exactly the same way as the all version – so if you pass an entire DB – you get an entire DB. In fact – the code above does exactly the same as the call to all just in a more complicated way, and in keeping with that theme, the old faithful ‘d’ for delimiter config parameter is there for your use.
Summary
OK, we’ve looked at all the export.csv procedures now, (unless in the time I’ve taken to write this they’ve put another one in) and hopefully I’ve cleared up a few questions about how it works and what exactly the configuration options are. If not, let me know and I’ll augment.
[…] Chris Skardon has started writing a Better Know APOC series of posts on the popular library. In the first post Chris shows how to install APOC and explains the basic structure of all its procedures. He then has a post looking at the procedures available for exporting data from Neo4j. […]
Hi, Thanks for the sharing the information. I wonder if there is a way to export entire database into separate CSV files based on different
NodeTypes and relationships? That way we can add our own additional CSVs for later import via bulk-csv importer tool , which takes different node csv files and relationship csv files.