Difference between revisions of "HoudiniTops"
|Line 23:||Line 23:|
* has built in support to work with tractor, deadline, hqueue
* has built in support to work with tractor, deadline, hqueue
Some of those features work great, others feel a little early and rough, but the potential is there
Some of those features work great, others feel a little early and rough, but the potential is there.
=== Workitems vs points ===
=== Workitems vs points ===
Revision as of 06:27, 10 December 2020
I thought Chops was my most hated part of Houdini. Then I tried using Tops. I've gone from confusion, to disdain, to fury, but after nearly a year of fighting it, I'm learning to, well, not love it, but can appreciate what it can do, and can make it do some useful things.
A huge problem with Tops is terminology, documentation, workflow. Lops was in a similar boat, but at least that had the excuse of deriving from USD. Tops should have made the bridge from existing Rops/Sops knowledge into Tops much easier.
Anyway, I think I have a handle on it now, so here's an attempt to explain it in simple terms.
Tops vs Rops vs sops vs farms
Tops is a node graph for tasks and processes. It's similar to Rops in that it lets you chain operations together and control execution order. Rops only lets you see what's going on at a high level, eg 'run mantra' or 'do a sim', while Tops is like going from /obj down into sops and playing with points; you can see the inner workings of the stuff you're running.
Because Tops has much more awareness of what each node is doing, it can do tricks that can be difficult with rops or standard renderfarm packages. Eg:
- ensure things are run in parallel as much as possible
- never have tasks waiting unnecessarily
- use as many cpu cores as you have as much as possible
- detect when tasks don't need to be re-run, skip over them
- when connected to a farm, do complex control over many machines doing many tasks
- put a simple gui on simple operations, a full python api for the complex things
- has built in support to control other apps, eg maya, nuke, ffmpeg
- has built in support to talk to asset management systems like shotgun
- has built in support to work with tractor, deadline, hqueue
Some of those features work great, others feel a little early and rough, but the potential is there.
Workitems vs points
If going from /obj to sops lets you work with points, going from rops to tops lets you play with workitems, the atomic unit of tops. At its simplest you can think of a workitem as a frame, so if you have a mantra top that renders frames 1 to 100, then when you run the graph you'll see 100 workitems. Tops visualises those as little dots on each top node.
Workitems can be many things
In a similar way that a point in sops can represent an actual point, or a packed primitive, or a RBD object, or whatever else, a workitem doesn't have to be just a rendered image. A workitem could be
- a frame of a render
- the location of a mp4 on disk
- an image sequence
- an asset name
Workitems themselves can be collated together into groups called partitions should you wish to work on groups of things, and expanded out again later into individual workitems, similar to how you can unpack a packed prim back into points.
No really, workitems can be anything
It's worth emphasising how workitems can be much more than frames.
Everything in tops is a workitem, i.e. a unit of work, i.e. a process that is done.
It's intentionally generic nature means it feels quite different to Rops, but means its capable of a lot more. A lot of pipeline tasks, even non vfx tasks can be handled in tops, because ultimately you're not controlling 3d or geometry, you're controlling processes and tasks.
Generating vs cooking
More sops analogies! Watch some of the masterclasses about sops, there's a LOT going on that users aren't aware of. Sidefx folk talk about dirtying sop nodes, traversing bottom up through the graph setting states, then working top down cooking nodes and geometry as it goes. You could think of that first process of traversing bottom-up as generating a list of work to do, and then the working top-down as actually cooking that list of work.
While Sops hides that distinction from the user, Tops shows it all. For simple graphs you don't have to worry about it, tops will generate and cook in one hit for you, but it's good to know the difference when you get stuck on more complex graphs.
The reasoning is you might have some nodes that take hours to calculate, but you don't always need to execute those nodes to know ahead of time how many frames (workitems) it could generate. In fact you might be able to do quite a lot of work designing your graph without ever having to execute any nodes, sort of like keeping Houdini in manual/nocook mode.
You can right click on a node and choose 'generate'. If the node is smart enough, it will generate workitems for you, which you can then cook. Normally you just generate and cook at the same time, with the keyboard shortcut shift-v.
Inputs and outputs
If tops nodes are to be chained together, they need to somehow pass information betweeen them. Similar to sops and point attributes, tops uses workitem attributes. While points must have @ptnum and @P, workitems usually have an index, input and output defined. Broadly speaking, input is what the node will work on, and output is the finished result of that node.
Eg, you have a filepattern top and a ffmpegextractimages top linked together. The filepattern top creates workitems from places on disk, so if you pointed it at $HIP/myvideos/*.mp4, when you cook that node and inspect a single workitem, there's no input attributes (it has no parent node giving it stuff), while output will be $HIP/myvideos/funnycat01.mp4.
Moving to the ffmpeg node, if you r.click on it, choose 'generate', then inspect a workitem, you'll see that input is set to $HIP/myvideos/funnycat01.mp4. Ie, this is the input the node will use to do stuff.
There's no output yet, because the node hasn't cooked. Cook it, inspect the workitem again, there's now an output attribute, which contains an array of the image sequence generated by ffmpeg.
Append another node, say a generic generator (kind of like a fancy null), and generate+cook it, select a workitem, now you can see that the input on this node is the same as the output from the previous node.
Took me a while to get used to this, I couldn't follow the relationship between inputs and outputs. In hindsight it makes sense; outputs from the previous node are copied to inputs for the next node. What this node does to the outputs is up to the node! It might copy the input to the output untouched (say like this generator 'null', or a node that is just creating other attributes), or it could generate completely new outputs (say converting image sequences back into mp4's).
What if you need to get to those attributes in code? What if you want to define your own attributes? What if you want to use those attributes outside of tops?
The input and output are exposed as @pdg_input and @pdg_output. A lot of work in tops is done using hscript expressions on parameters, so most of the time you have to escape them in backticks. There's several implicit pdg attributes like this.
Pdg attributes are available to the rest of houdini, and can be used where you'd do things like $HIP, $OS, $F, $T etc. When the tops graph is run, and that particular workitem within the tops node is being processed, those pdg attributes will be set and can be used. Eg, you use a Fetch top to run a rop somewhere else in houdini, say a cops graph. You could set the file cop at the top of your compositing network to use `@pdg_input` as the image source, and it will then be replaced with whatever image sequence you choose to find (or generate) in tops.
Attributes and the rest of houdini
It's also worth noting that because these attributes are specific to the workitem, if you start using these in your houdini setups, they can be tricky to understand for other houdini users, especially those who aren't familiar with tops (which is most of the Houdini artist pool). Those @pdg attributes don't exist until the tops graph is cooked, and even when the graph is cooked, it won't do anything in the GUI until you click on a worktime dot, and even then, you probalby have to make sure you click the correct workitem dot on the correct tops node.
Tops and the houdini gui
Aka 'Why can't I see anything?'
It's VERY confusing for new users, moreso because the state of the tops graph isn't remembered between saves. Ie you can cook the graph, save the hip, reload the hip, the tops graph looks uncooked again. Be careful.
Short version: If things are acting weird, set the 'generate when' option to dynamic, you'll get a little purple icon to say it's now dynamic, stuff should work.
Long version: Tops makes a distinction between 'generate' and 'cook' steps for nodes. Ideally you would know ahead of time what each node is going to produce, so you get a sense of how many images/sim caches/mp4's you're going to generate before the work is done. But sometimes you can't know that in advance, and you have to adjust workflow accordingly.
Eg, say you're using a Fetch Top to run a mantra render. You don't need to execute the render to find out how many frames you'll generate, its right there in the parameters on the mantra rop (or better yet on the fetch top itself). If you r.click the node and go 'generate', it'll populate with as many dots as you have frames in the render. Easy.
Now say you have a ffmpegextractimages top, and you want to use an attribute create top to set @framecount, the number of frames that were extracted. Tops cannot know this number until ffmpeg actually runs.
The 'generate when' mode at the top of every node tells the node when to do its workitem calculation (the generate step). The default is 'automatic', tops will try and guess if nodes should wait for previous nodes to cook or not. But sometimes tops guesses wrong, and things will silently misbehave. In this case with the framecount setup, I had to set the mode to 'each upstream item is cooked'. So now when I'm processing many videos, only when each video has been extracted will the attribute create node start, and be able to create the right result.
Short version: If multiple runs of your tops graph look the same, delete results on disk, try again.
Long version: Because Tops can get right down to that atomic workitem level, it can do some tricks that aren't possible in Rops or other systems. A big part of this is recognising when parts of the network have already been run, and don't need to be recooked.
The example I keep coming back to here of processing a folder of mp4s. Say you had the following chain of tops nodes:
- extract images top converting mp4s to image sequences
- fetch to to run a cops graph that processes those images to a temp location
- fetch to run a geometry rop that traces the images
- fetch to a opengl rop
- fetch to a VAT rop
etc. Obviously a lot of that only needs to run once if you're making minor changes here and there to the network, or adding a new video to a folder full of already processed videos.
As part of the cook process, wherever possible top nodes will check if output exists where it expects to write files. If it does, it will mark that workitem as complete, saving a lot of processing time.
Of course this is great when you expect it, infuriating when you don't.
Most nodes have a r.click option after its been cooked, 'delete this nodes results from disk'. For the most part it does the right thing, and it will then of course force a recook of all the workitems on the next run.
If you just want a single workitem recooked, you can go on disk and delete whatever cache that is. I've found there's a r.click menu per workitem dot too that also claims to do it, but then I also trusted that menu to show me the files linked to only the current workitem dot, instead it spawned 600 mplay windows, one for every frame of every workitem I had, and had to restart my computer. YMMV.
Still, caching, be aware of it, make it work for you.
Simple cache then render locally
Download hip: File:pdg_basic_v01.hip
Most existing Houdini users want the basics from PDG; cache a sim to disk, run a render. Maybe chaser mode as a bonus? FFmpeg the result into an mp4, why not eh, YOLO!
Here's that setup. Click the triangle with the orange thingy on it to start.
cache sim is a fetch top that points to a disk cache sop after a simulation. You DON'T want a sim cache running on multiple threads/machines, it should just be one job that runs sequentually. To do this enable 'all frames in one batch'.
map by index controls execution order and limits how jobs are created. If you have node A that generates 10 things, connected to node B that is also set to generate 10 things, PDG's default behavior is to generate 10 B things for each thing made by A. In other words, you'll get 10 x 10 = 100 total tasks. For situations like this, that's definitely not what you want.
The mapbyindex ensures tasks are linked together, so 1 frame of the cache is linked to 1 frame of the render. Further, it allows a 'chaser' mode, in that as soon as frame 1 of the sim cache is done, frame 1 of the mantra render can start, frame 2 of the sim cache is done, frame 2 of the mantra render can start etc.
mantra is another fetch top that points to a mantra rop.
waitforall does as implied, it won't let downstream nodes start until all the upstream nodes are completed. It also subtly adjust the flow of tasks; the previous nodes have 48 dots representing the individual frames, this node has a single rectangle, implying its now treating the frame sequence as a single unit.
ffmpeg top needs some explaining (and some adjustments to the fetch top that calls the mantra rop), which I explain below.
Note that the frameranges on the fetch tops override the ranges set on their target rops by default.
Also note that the button with the orange thingy on it kicks off the output, looking for the matching node with the orange output flag. See in that screenshot how I've left it on the mantra node? That means it'll never run the ffmpeg task. I'm an idiot.
The help mentions some stuff in the task menu, I couldn't see it.
Well, it's there. Hidden in plain sight.
@pdg_input is blank
In summary, check the nodes above the erroring one, especially if they have an unchecked copy inputs to outputs toggle.
I've had a few occasions where I'll have a node error, go inspect a workitem and see that where I'm expecting to find a value for `@pdg_input` in a parameter, its actually an empty string.
The culprit is always a previous node. Related to what I mentioned before, nodes usually expect an input, and most of the times set an output. What can happen if you're not careful is some top nodes may not set an output, or don't do the implicit 'copy input to output' if they don't need to. The generic generator node, which I use occasionally as null to just have a place to see whats flowing through, is an example of this.
Set a limit on the number of workitems
Say you have a folder full of images that you want to process, but for testing just want the first 5 images.
A filterbyrange top will let you do this.
Pick 2 items from each group of workitems
I have a filepattern searching an inputs folder. That folder is full of subfolders, say animal names, and in each of those are mp4's I want to process. So the folder structure might look like this:
/inputs/dog/poodle.mp4 /inputs/dog/pug.mp4 /inputs/dog/terrier.mp4 /inputs/dog/labrador.mp4 /inputs/cat/black.mp4 /inputs/cat/striped.mp4 /inputs/cat/white.mp4 /inputs/cat/ginger.mp4 /inputs/cat/longhair.mp4 /inputs/bird/parrot.mp4 /inputs/bird/gull.mp4
Now say while testing I only want to grab 1 or 2 from each category?
A partition top lets you collate workitems, sort of like packing geometry. You can partition in many ways here I'd use a partition by attribute top. I'd use an attrib from string top to split off the animal type into its own attribute, @animal, then partition by 'animal'. Make sure 'partition attribtes independently'.
Now we can unpack them again using a work item expand, but handily this has several ways to do that unpack. A handy one is 'first N', so here I can just say give me the first 2 from each animal, and I get just that.
On this node, make sure apply expansion to: is set to 'items in upstream partition', otherwise it can do odd things like duplicate the first item it finds per animal twice.
MATT, YOU KEEP FORGETTING TO DO THIS, SET 'ITEMS IN UPSTREAM PARTITION'!!!!
Finally to make sure the output is sorted per animal, a sort top can be used, with the name parameter set to animal.
Get framecount from ffmpeg extract images
The ffmpeg extract images node usually generates a output attribute which is an array of the images its created. At some point this stopped working for me, so I had to find another way to count the images.
After talking to support it was due to putting quotes around the output parameter path. Doh. Still, leaving this here as it's still a good mini example of tops workflows.
So as I said, we could try and find the files in the ffmpeg output folder for the node. A filePattern top can do this. If the ffmpegextractimages node is using the default
The following filepattern node should point to the folder above it, ie
It also has 'split files into seperate items' turned OFF, so that way I just have a single workitem per image sequence.
If you try this now, it won't generate any workitems. The default generate mode will mean it tries to look in that folder straight away, finds no images, and as such returns no workitems. Change the generate mode to 'each upstream item is cooked', then it works as expected.
Ok great, but where's the actual number of frames? It's there, annoyingly hidden. There's some tops attributes that don't appear in the middle click info, one of those is @pdg_outputsize. In this case, unsurprisingly, it returns the amount of frames in the sequence. So with an attribute create node, you can create a new integer variable called framecount, and set an expression to use @pdg_outputsize.
Note that you don't need to change the generate mode on the attribute create. As soon as any upstream node is set to be dynamic (ie, it has to wait for previous items to cook), all subsequent nodes are also made dynamic.
Create attribute from string array element
Related to the previous example, I used an attribute from string node to split a directory path into components, and then wanted to create a new attribute based on the last part of that path. Annoyingly I couldn't work out how to get an array element from the standard attribute create node, so I gave up and used a python script node instead:
renderpass = work_item.attrib('split')[-1] work_item.setStringAttrib("renderpass", renderpass)
Had another go, here's a purely node based approach:
The tops method for getting array elements is @attribute.index, eg @split.0. Here I wanted the last element, but there's no python style -1 syntax, so instead I create the array, reverse it, read the 0th element. Specifically:
- attribute from string top, split by delimiter enabled, '/' as the delimeter
- attribute array top, update existing 'split' attribute, reversed enabled
- attribute create top, uses `@split.0`
Ffmpeg and non sidefx rops
I had a few issues getting ffmpeg to make mp4's from a renderman rop. In the end the fixes were relatively straightfoward.
The ffmpeg top needs to know a few things:
- what part of the upstream nodes are making images, set with output file tag
- what the location of those images are, set with output parm name
- that the images are bundled into a single unit of work, using a waitforall top.
Top nodes can tag their output(s), in this case the ffmpeg top expects the images to have a 'file/image' tag. On the fetch top for renderman rop enable 'output file tag' and use the dropdown to select 'file/image'
To know what file name to put in that tag, enable 'output parm name' and set it to 'ri_display_0'. This is the parameter on the ris rop where the image path is set.
To bundle all the frames into a single unit, use a waitforall top.
A last specific thing for our setup, our build of ffmpeg didn't understand the '-apply_trc' option, so I disabled it.
Force python scripts to run on the farm
If you have a python script node, even if you have a tractor or deadline scheduler, it will run in your local houdini session by default.
To fix this, turn off 'Evaluate in process'.
Ensure a rop geometry top sim runs on a single blade
You don't want 240 machines all doing their own run up, that's silly. Go to the 'rop fetch' tab, enable 'all frames in one batch', that'll lock it to a single blade and run sequentially.
Tractor scheduler stuck
too often less often after some fixes from sidefx. Tricks to unstick it in order of least to most annoying:
- Make sure there's no active stuck jobs of yours on the farm, delete 'em all and try again
- R.click on tractor scheduler, 'delete temp directory'
- Select the tractor scheduler, ctrl-x to cut it, ctrl-v to paste it
- Reload the hip
- Restart houdini
- Quit FX
- Quit the industry
Rez and tops debugging
Running this in a python script top to see whats going on with rez and environment values:
print('debug info...') key = 'REZ_RESOLVE' print(key+'='+os.environ[key]) print ('') import pdg has_tractor = str(pdg.types.schedulers.has_tractor) print('pdg.types.schedulers.has_tractor: ' + has_tractor) print ('')
Mostly works, the long story can be found below, but here's the summary:
- Your environment needs access to the python tractor api. If you use rez, make sure to bring in a package for tractor.
- PDG assumes it'll find $PYTHON set correctly. We didn't have this, but even then I found I couldn't use the regular system python, but had to point it to hython ( $HFS/bin/hython )
- If your farm is behind a firewall, make sure your IT department chooses 2 ports you can use, and enter those ports into the callback and relay port fields on the tractor scheduler
- As of 18.0.502 retry support exists on the tractor scheduler, as well as options for better logging.
- Cooking jobs by default expects to connect to your desktop machine to update information, give you blinky lights and dots. This means that if you close your houdini session, the job will stop working on the farm. Call me old fashioned, but that defeats most of the point of using a farm. If you don't want this, use the 'submit graph as job' option at the top of the tractor scheduler, and it will run independent of your GUI session. Getting these to work reliably was problematic for us, YMMV.
Tops and tractor diary
Moving the diary to TopsTractorDiary.