Difference between revisions of "HoudiniTops"
|Line 2:||Line 2:|
== Overview ==
== Overview ==
I thought Chops was my most hated part of Houdini
I thought Chops was my most hated part of Houdini Tops . I've gone from confusion, to disdain, to fury, but after nearly a year of fighting it, I'm learning to, well, not ''love'' it, but can appreciate what it can do, and can make it do some useful things.
A huge problem with Tops is terminology, documentation, workflow. Lops was in a similar boat, but at least that had the excuse of deriving from USD
A huge problem with Tops is terminology, documentation, workflow. Lops was in a similar boat, but at least that had the excuse of deriving from USD .
I '' , so 'to explain in terms .
=== Tops vs Rops, obj vs sops, vs farms ===
=== Tops vs Rops, obj vs sops, vs farms ===
Revision as of 17:07, 11 December 2020
Tops has been an emotional ride. I thought Chops was my most hated part of Houdini until picking up Tops about a year ago. I've gone from confusion, to disdain, to fury, but after nearly a year of fighting it, I'm learning to, well, not love it, but can appreciate what it can do, and can make it do some useful things.
A huge problem with Tops is terminology, documentation, workflow. Lops was in a similar boat, but at least that had the excuse of deriving from USD and having to stick to those concepts.
In December 2020 I had that 'click' moment, so figured I'd write down a bunch of notes to help explain Tops in terms most Houdini folk can understand. If any of it isn't clear, let me know!
Tops vs Rops, obj vs sops, vs farms
Tops is a node graph for tasks and processes. It's similar to Rops in that it lets you chain operations together and control execution order. A major difference between Rops and Tops is fine grain control, as well as more general scope. Layering analogies, Rops compared to Tops is like /obj compared to Sops.
- /obj is a container based view of things, you don't see whats going on inside the nodes, and handles a limited set of nodes
- rops is a container based view of rendering, you don't see whats going on inside the nodes, its a limited set of nodes and control
- sops is an atomic view of things, you can get right down to the points, its much more general in scope
- tops is an atomic view of tasks and processing, you can get right down to the frames/individual processes, is much more general in scope
Because Tops has much more awareness of what each node is doing, it can do tricks that can be difficult with rops or standard renderfarm packages. Eg:
- ensure things are run in parallel as much as possible
- never have tasks waiting unnecessarily
- use as many cpu cores as you have as much as possible
- detect when tasks don't need to be re-run, skip over them
- when connected to a farm, do complex control over many machines doing many tasks
- put a simple gui on simple operations, a full python api for the complex things
- has built in support to control other apps, eg maya, nuke, ffmpeg
- has built in support to talk to asset management systems like shotgun
- has built in support to work with tractor, deadline, hqueue
Some of those features work great, others feel a little early and rough, but the potential is there.
Workitems vs points
If going from /obj to sops lets you work with points, going from rops to tops lets you play with workitems, the atomic unit of tops. At its simplest you can think of a workitem as a frame, so if you have a mantra top that renders frames 1 to 100, then when you run the graph you'll see 100 workitems. Tops visualises those as little dots on each top node.
Workitems can be many things
In a similar way that a point in sops can represent an actual point, or a packed primitive, or a RBD object, or whatever else, a workitem doesn't have to be just a rendered image. A workitem could be
- a frame of a render
- the location of a mp4 on disk
- an image sequence
- an asset name
Workitems themselves can be collated together into groups called partitions should you wish to work on groups of things, and expanded out again later into individual workitems, similar to how you can unpack a packed prim back into points.
No really, workitems can be anything
It's worth emphasising how workitems can be much more than frames.
Everything in tops is a workitem, i.e. a unit of work, i.e. a process that is done.
It's intentionally generic nature means it feels quite different to Rops, but means its capable of a lot more. A lot of pipeline tasks, even non vfx tasks can be handled in tops, because ultimately you're not controlling 3d or geometry, you're controlling processes and tasks.
Generating vs cooking
More sops analogies! Watch some of the masterclasses about sops, there's a LOT going on that users aren't aware of. Sidefx folk talk about dirtying sop nodes, traversing bottom up through the graph setting states, then working top down cooking nodes and geometry as it goes. You could think of that first process of traversing bottom-up as generating a list of work to do, and then the working top-down as actually cooking that list of work.
While Sops hides that distinction from the user, Tops shows it all. For simple graphs you don't have to worry about it, tops will generate and cook in one hit for you, but it's good to know the difference when you get stuck on more complex graphs.
The reasoning is you might have some nodes that take hours to calculate, but you don't always need to execute those nodes to know ahead of time how many frames (workitems) it could generate. In fact you might be able to do quite a lot of work designing your graph without ever having to execute any nodes, sort of like keeping Houdini in manual/nocook mode.
You can right click on a node and choose 'generate'. If the node is smart enough, it will generate workitems for you, which you can then cook. Normally you just generate and cook at the same time, with the keyboard shortcut shift-v.
Inputs and outputs
If tops nodes are to be chained together, they need to somehow pass information betweeen them. Similar to sops and point attributes, tops uses workitem attributes. While points must have @ptnum and @P, workitems usually have an index, input and output defined. Broadly speaking, input is what the node will work on, and output is the finished result of that node.
Eg, you have a filepattern top and a ffmpegextractimages top linked together. The filepattern top creates workitems from places on disk, so if you pointed it at $HIP/myvideos/*.mp4, when you cook that node and inspect a single workitem, there's no input attributes (it has no parent node giving it stuff), while output will be $HIP/myvideos/funnycat01.mp4.
Moving to the ffmpeg node, if you r.click on it, choose 'generate', then inspect a workitem, you'll see that input is set to $HIP/myvideos/funnycat01.mp4. Ie, this is the input the node will use to do stuff.
There's no output yet, because the node hasn't cooked. Cook it, inspect the workitem again, there's now an output attribute, which contains an array of the image sequence generated by ffmpeg.
Append another node, say a generic generator (kind of like a fancy null), and generate+cook it, select a workitem, now you can see that the input on this node is the same as the output from the previous node.
Took me a while to get used to this, I couldn't follow the relationship between inputs and outputs. In hindsight it makes sense; outputs from the previous node are copied to inputs for the next node. What this node does to the outputs is up to the node! It might copy the input to the output untouched (say like this generator 'null', or a node that is just creating other attributes), or it could generate completely new outputs (say converting image sequences back into mp4's).
What if you need to get to those attributes in code? What if you want to define your own attributes? What if you want to use those attributes outside of tops?
The input and output are exposed as @pdg_input and @pdg_output. A lot of work in tops is done using hscript expressions on parameters, so most of the time you have to escape them in backticks. There's several implicit pdg attributes like this.
Pdg attributes are available to the rest of houdini, and can be used where you'd do things like $HIP, $OS, $F, $T etc. When the tops graph is run, and that particular workitem within the tops node is being processed, those pdg attributes will be set and can be used.
In the gif I use a file pattern top to find *.jpg in a folder. I can then use `@pdg_output` in cops, and the file cop will load the image from the workitem.
Tops vs standard workflows
Seen in isolation, that above example seems pretty clear right? Now imagine you hadn't read this page, and had been given a hip like that with no documentation. Would you be able to understand what's going on? Probably not. Worse, what if when you load this hip the tops graph is uncooked, so no workitems exist, no workitem is pre-clicked, so the hip just looks broken with no clues as to why? Cos that's exactly what happens. Tops is not forgiving to people who don't know Tops.
Now think about this; when you click on a workitem, you're changing values on nodes somewhere else by clicking on this little dot here. Click a different dot, get a different result.
Expand this thought further, when you cook, you can (and will) have many workitems cooking in parallel, and probably not just on this node, there'll be slow workitems from the previous node still cooking, while faster items might have raced ahead to the next tops node.
In this setup, what is @pdg_output? In the previous example, what will I see in the comp window when all these things are running in parallel?
The answer sounds like a riddle from The Matrix; the workitems are really running in parallel in their own little worlds, when you click on a workitem you're seeing a snapshot of work thats already done; they're cooked values, you're viewing one particular 'reality' out of all the workitems and all the nodes in your graph.
It's not quite as quantum state as it sounds, but it's definitely a new mental model to overlay on top of your existing mental model of Houdini. It's sort of like takes, its sort of like wedging and hscript, its sort of like using a renderfarm, but also not like all those things.
In other words, its new, and new can be confusing.
ALSO also, the state of the tops nodes and workitems aren't saved with the hip. So while you might have a cooked graph, selected a workitem near the end of your flow, see a cool result and hit save, when you reload the hip (or worse, give it to someone else), the graph looks uncooked, with no workitems selected.
Now to be fair if you recook, tops has a pretty clever caching and 'rework avoidance' design, so it will quickly work out the nodes don't need to be recooked, and you can preview that workitem again. But still, you need to know this. If you're handing work off to another artist, that artist needs to know 'ok, the hip will look broken when you first load it, so go to tops, cook the graph, select the last node, click a workitem dot'. Again, more new workflows, more stuff for existing Houdini artists to be frustrated by.
I was very much in this camp until I forced myself to get into Tops. I'm better with it now, but even then, Tops isn't quite as self documenting and discoverable as Sops (or Rops) are. Be aware that you have to commit and train a team to use Tops.
Generate mode, automatic vs other
Short version: If things are acting weird, set the 'generate when' option to 'each upstream item is cooked', you'll get a little purple icon to say it's now dynamic, stuff should work.
I mentioned earlier that Tops separates generate vs cook. Ideally tops can generate workitems without needing to cook, like reading the frame range parameters from a mantra top.
But there'll be cases when that can't be known without actually cooking the node. Eg a ffmpeg extract images top has to actually extract the images before it can tell you it made 135 images, and therefore give you 135 workitems.
For most simple cases Tops can automatically know if can generate workitems in advance, or if it has to cook the node first, which is why the default generate mode is 'automatic'.
But for some nodes, it needs a helping hand, hence the other modes.
A tricky situation is when you cook a graph and it generates bad results, but if you click a workitem, the tops attributes appear to be correct. In an earlier section I mentioned that clicking on workitems is like debugging after the cook is done. It's possible for a generate step to make the wrong amount of workitems, or the wrong attribute values, therefore the cook generates the wrong results on disk, but the very act of cooking fixes the incorrect workitems and attribute values!
In these cases, if things appear to be in a wierd working-but-broken state when inspected, the first thing I do is change the generate mode to not be automatic, but to force it to wait for upstream items to cook before it tries to generate its own workitems.
Short version: If multiple runs of your tops graph look the same, delete results on disk, try again.
Long version: Because Tops can get right down to that atomic workitem level, it can do some tricks that aren't possible in Rops or other systems. A big part of this is recognising when parts of the network have already been run, and don't need to be recooked.
The example I keep coming back to here of processing a folder of mp4s. Say you had the following chain of tops nodes:
- ffmpeg extract images to converting mp4s into image sequences
- fetch to to run a cops graph that processes those images to a temp location
- fetch to run a geometry rop that traces the images
- fetch to a opengl rop
- fetch to a VAT rop
etc. Obviously a lot of that only needs to run once if you're making minor changes here and there to the network, or adding a new video to a folder full of already processed videos.
As part of the cook process, wherever possible top nodes will check if output exists where it expects to write files. If it does, it will mark that workitem as complete, skipping a lot of processing time.
Of course this is great when you expect it, infuriating when you don't.
Most nodes have a r.click option after its been cooked, 'delete this nodes results from disk'. For the most part it does the right thing, and it will then of course force a recook of all the workitems on the next run.
If you just want a single workitem recooked, you can go on disk and delete whatever cache that is. I've found there's a r.click menu per workitem dot to get that atomic with your deletes.
Note that sometimes tops gets confused and won't delete files, or will delete too much, best to keen an eye on your files while doing these operations until you get a feel for it.
Caching, be aware of it, make it work for you.
Simple cache a sim then render locally
Download hip: File:pdg_basic_v01.hip
Most existing Houdini users want the basics from PDG; cache a sim to disk, run a render. Maybe chaser mode as a bonus? FFmpeg the result into an mp4, why not eh, YOLO!
Here's that setup. Click the triangle with the orange thingy on it to start.
cache sim is a fetch top that points to a disk cache sop after a simulation. You DON'T want a sim cache running on multiple threads/machines, it should just be one job that runs sequentually. To do this enable 'all frames in one batch'.
map by index controls execution order and limits how jobs are created. If you have node A that generates 10 things, connected to node B that is also set to generate 10 things, PDG's default behavior is to generate 10 B things for each thing made by A. In other words, you'll get 10 x 10 = 100 total tasks. For situations like this, that's definitely not what you want.
The mapbyindex ensures tasks are linked together, so 1 frame of the cache is linked to 1 frame of the render. Further, it allows a 'chaser' mode, in that as soon as frame 1 of the sim cache is done, frame 1 of the mantra render can start, frame 2 of the sim cache is done, frame 2 of the mantra render can start etc.
mantra is another fetch top that points to a mantra rop.
waitforall does as implied, it won't let downstream nodes start until all the upstream nodes are completed. It also subtly adjust the flow of tasks; the previous nodes have 48 dots representing the individual frames, this node has a single rectangle, implying its now treating the frame sequence as a single unit.
ffmpeg top needs some explaining (and some adjustments to the fetch top that calls the mantra rop), which I explain below.
Note that the frameranges on the fetch tops override the ranges set on their target rops by default.
Also note that the button with the orange thingy on it kicks off the output, looking for the matching node with the orange output flag. See in that screenshot how I've left it on the mantra node? That means it'll never run the ffmpeg task. I'm an idiot.
Wedging a rock generator
It was either this or a fence generator, lord knows we need more tutorials on both these important things.
Here's the hip before getting into tops if you want to follow along:
In this hip is a straightforward rock generator. The sops flow is
- high res sphere
- some scattered points with random scale and N+up
- copy spheres to points
- point vop to displace spheres with worley noise
- attrib noise for more high frequency detail
- attrib noise for colour
- Cd converted to HSV and back again to roughly match the colour of the env background.
So with this all setup, we could randomise a bunch of stuff with tops.
The first thing we'll do is wedge the number of points in the scatter. We'll create a wedge top, which will make a pdg attribute we can reference on the scatter.
- Create a topnet
- Create a wedge top
- Set the wedge count to 5, so we get 5 variations
- Add a new wedge attribute with the multilister
- Attrib name scatternum
- Attrib type Integer
- Set start/end to 2 and 6, so we'll generate a minimum of 2 scatter points, a maximum of 6.
- Shift-v on the node to cook it and see what we have so far.
Middle clicking on each workitem, we can see that each workitem has a scatternum attribute, starting at 2 and ending at 6. That might be useful for other things, but here we don't want it to be gradually increasing, we want it to be a random integer between 2 and 6. Enable random samples, cook, look again.
That's better, random samples for each workitem.
To use this in the scatter sop, all we do is type @scatternum into the scatter force total count parameter, and bam, its connected.
Add more entries to the wedge multilister, fill in parameters over in sops, bosh, you have a wedge setup. Note that when you create vectors, you access the components with @attrib.0, @attrib.1, @attrib.2.
Eg here I create a noiseoffset wedge, and drive the point vop noise offset with it.
So that's all wedging, how do we write this out? We could use a disk cache sop, and set the output name to use @pdg_index, which corresponds to the id of each workitem (ie, 0 to 4 in this case of 5 wedges). Or you could use the cache top which basically does the same thing.
- append a rop geometry output top to the wedge
- set the sop path to the end of your sop chain, /obj/rocks/OUT_ROCK in my case
- set the output file parameter to make a unique file per workitem, eg $HIP/geo/$HIPNAME.$OS.`@pdg_index`.bgeo.sc
- Now if you click through the workitems in the previous node, you can see the file path change if you mmb on the label.
- cook the node, and it'll be baked to disk.
Here's the finished hip:
Download hip: File:tops_rockgen_end.hip
The help mentions some stuff in the task menu, I couldn't see it.
Well, it's there. Hidden in plain sight.
What do the colours mean in the mmb info for workitems?
Colours represent the attribute types.
- dark cyan - internal attributes
- pink - string
- green - integer
- yellow - float
@pdg_input is blank
In summary, check the nodes above the erroring one, especially if they have an unchecked copy inputs to outputs toggle.
I've had a few occasions where I'll have a node error, go inspect a workitem and see that where I'm expecting to find a value for `@pdg_input` in a parameter, its actually an empty string.
The culprit is always a previous node. Related to what I mentioned before, nodes usually expect an input, and most of the times set an output. What can happen if you're not careful is some top nodes may not set an output, or don't do the implicit 'copy input to output' if they don't need to. The generic generator node, which I use occasionally as null to just have a place to see whats flowing through, is an example of this.
Set a limit on the number of workitems
Say you have a folder full of images that you want to process, but for testing just want the first 5 images.
A filterbyrange top will let you do this.
Pick 2 items from each group of workitems
I have a filepattern searching an inputs folder. That folder is full of subfolders, say animal names, and in each of those are mp4's I want to process. So the folder structure might look like this:
/inputs/dog/poodle.mp4 /inputs/dog/pug.mp4 /inputs/dog/terrier.mp4 /inputs/dog/labrador.mp4 /inputs/cat/black.mp4 /inputs/cat/striped.mp4 /inputs/cat/white.mp4 /inputs/cat/ginger.mp4 /inputs/cat/longhair.mp4 /inputs/bird/parrot.mp4 /inputs/bird/gull.mp4
Now say while testing I only want to grab 1 or 2 from each category?
A partition top lets you collate workitems, sort of like packing geometry. You can partition in many ways here I'd use a partition by attribute top. I'd use an attrib from string top to split off the animal type into its own attribute, @animal, then partition by 'animal'. Make sure 'partition attribtes independently'.
Now we can unpack them again using a work item expand, but handily this has several ways to do that unpack. A handy one is 'first N', so here I can just say give me the first 2 from each animal, and I get just that.
On this node, make sure apply expansion to: is set to 'items in upstream partition', otherwise it can do odd things like duplicate the first item it finds per animal twice.
MATT, YOU KEEP FORGETTING TO DO THIS, SET 'ITEMS IN UPSTREAM PARTITION'!!!!
Finally to make sure the output is sorted per animal, a sort top can be used, with the name parameter set to animal.
workitem @pdg_index isn't unique like @ptnum
For a while I was treating the workitem @pdg_index like @ptnum, in that I assumed it was always unique, would resort and reindex as you add and remove workitems.
I've since found that's not the case. If you do things like merge different streams, or do partitioning and unpacking, you could have 5 workitems that all have a @pdg_index of 0. That means if you do things like say limit by range to 0, thinking you'll just get the single zeroth entry, you'll get back 5.
A bit confusing, you have to use things like sort tops and other nodes to force workitems to reindex, I'm still working out the best method for this.
Get framecount from ffmpeg extract images
The ffmpeg extract images node usually generates a output attribute which is an array of the images its created. At some point this stopped working for me, so I had to find another way to count the images.
After talking to support it was due to putting quotes around the output parameter path. Doh. Still, leaving this here as it's still a good mini example of tops workflows.
So as I said, we could try and find the files in the ffmpeg output folder for the node. A filePattern top can do this. If the ffmpegextractimages node is using the default
The following filepattern node should point to the folder above it, ie
It also has 'split files into seperate items' turned OFF, so that way I just have a single workitem per image sequence.
If you try this now, it won't generate any workitems. The default generate mode will mean it tries to look in that folder straight away, finds no images, and as such returns no workitems. Change the generate mode to 'each upstream item is cooked', then it works as expected.
Ok great, but where's the actual number of frames? It's there, annoyingly hidden. There's some tops attributes that don't appear in the middle click info, one of those is @pdg_outputsize. In this case, unsurprisingly, it returns the amount of frames in the sequence. So with an attribute create node, you can create a new integer variable called framecount, and set an expression to use @pdg_outputsize.
Note that you don't need to change the generate mode on the attribute create. As soon as any upstream node is set to be dynamic (ie, it has to wait for previous items to cook), all subsequent nodes are also made dynamic.
Create attribute from string array element
Related to the previous example, I used an attribute from string node to split a directory path into components, and then wanted to create a new attribute based on the last part of that path. Annoyingly I couldn't work out how to get an array element from the standard attribute create node, so I gave up and used a python script node instead:
renderpass = work_item.attrib('split')[-1] work_item.setStringAttrib("renderpass", renderpass)
Had another go, here's a purely node based approach:
The tops method for getting array elements is @attribute.index, eg @split.0. Here I wanted the last element, but there's no python style -1 syntax, so instead I create the array, reverse it, read the 0th element. Specifically:
- attribute from string top, split by delimiter enabled, '/' as the delimeter
- attribute array top, update existing 'split' attribute, reversed enabled
- attribute create top, uses `@split.0`
Ffmpeg and non sidefx rops
I had a few issues getting ffmpeg to make mp4's from a renderman rop. In the end the fixes were relatively straightfoward.
The ffmpeg top needs to know a few things:
- what part of the upstream nodes are making images, set with output file tag
- what the location of those images are, set with output parm name
- that the images are bundled into a single unit of work, using a waitforall top.
Top nodes can tag their output(s), in this case the ffmpeg top expects the images to have a 'file/image' tag. On the fetch top for renderman rop enable 'output file tag' and use the dropdown to select 'file/image'
To know what file name to put in that tag, enable 'output parm name' and set it to 'ri_display_0'. This is the parameter on the ris rop where the image path is set.
To bundle all the frames into a single unit, use a waitforall top.
A last specific thing for our setup, our build of ffmpeg didn't understand the '-apply_trc' option, so I disabled it.
Force python scripts to run on the farm
If you have a python script node, even if you have a tractor or deadline scheduler, it will run in your local houdini session by default.
To fix this, turn off 'Evaluate in process'.
Ensure a rop geometry top sim runs on a single blade
You don't want 240 machines all doing their own run up, that's silly. Go to the 'rop fetch' tab, enable 'all frames in one batch', that'll lock it to a single blade and run sequentially.
Tractor scheduler stuck
too often less often after some fixes from sidefx. Tricks to unstick it in order of least to most annoying:
- Make sure there's no active stuck jobs of yours on the farm, delete 'em all and try again
- R.click on tractor scheduler, 'delete temp directory'
- Select the tractor scheduler, ctrl-x to cut it, ctrl-v to paste it
- Reload the hip
- Restart houdini
- Quit FX
- Quit the industry
Rez and tops debugging
Running this in a python script top to see whats going on with rez and environment values:
print('debug info...') key = 'REZ_RESOLVE' print(key+'='+os.environ[key]) print ('') import pdg has_tractor = str(pdg.types.schedulers.has_tractor) print('pdg.types.schedulers.has_tractor: ' + has_tractor) print ('')
Mostly works, the long story can be found below, but here's the summary:
- Your environment needs access to the python tractor api. If you use rez, make sure to bring in a package for tractor.
- PDG assumes it'll find $PYTHON set correctly. We didn't have this, but even then I found I couldn't use the regular system python, but had to point it to hython ( $HFS/bin/hython )
- If your farm is behind a firewall, make sure your IT department chooses 2 ports you can use, and enter those ports into the callback and relay port fields on the tractor scheduler
- As of 18.0.502 retry support exists on the tractor scheduler, as well as options for better logging.
- Cooking jobs by default expects to connect to your desktop machine to update information, give you blinky lights and dots. This means that if you close your houdini session, the job will stop working on the farm. Call me old fashioned, but that defeats most of the point of using a farm. If you don't want this, use the 'submit graph as job' option at the top of the tractor scheduler, and it will run independent of your GUI session. Getting these to work reliably was problematic for us, YMMV.
Tops and tractor diary
Moving the diary to TopsTractorDiary.