Difference between revisions of "HoudiniTops"
|Line 651:||Line 651:|
# example running the hda processor for poly reduction or geo processing
# example running the hda processor for poly reduction or geo processing
# using the invoke top for geo processing(can use the top geometry sop)
# using the invoke top for geo processing(can use the top geometry sop)
# more uses of the partitioning tops, eg partitioning a wedged -> substepped simulations to give me the wedges, and then I combine at the end to use the partition by frame to run the correct set of frames
# more uses of the partitioning tops, eg partitioning a wedged -> substepped simulations to give me the wedges, and then I combine at the end to use the partition by frame to run the correct set of frames
Revision as of 15:18, 13 December 2020
Tops has been an emotional ride. I thought Chops was my most hated part of Houdini until picking up Tops about a year ago. I've gone from confusion, to disdain, to fury, but after nearly a year of fighting it, I'm learning to, well, not love it, but can appreciate what it can do, and can make it do some useful things.
A huge problem with Tops is terminology, documentation, workflow. Lops was in a similar boat, but at least that had the excuse of deriving from USD and having to stick to those concepts.
That said in December 2020 I had that 'click' moment. I've been able to do some complex work setups that would be difficult or impossible without Tops. It was a good time to update these notes, try and explain tops in terms Houdini artists can understand. If any of it isn't clear (or better, if you have fixes for my mistakes), let me know!
This is very wordy btw, sorry about that.
Tops vs Rops, obj vs sops, vs farms
Tops is a node graph for tasks and processes. It's similar to Rops in that it lets you chain operations together and control execution order. A major difference between Rops and Tops is fine grain control. Layering analogies, Rops compared to Tops is like /obj compared to Sops.
- /obj is a container based view of things, you don't see what's going on inside the nodes, and it has a limited set of nodes
- Rops is a container based view of rendering, you don't see whats going on inside the nodes, it has a limited set of nodes
- Sops is an atomic view of things, you can get right down to the points, it's much more general in scope
- Tops is an atomic view of tasks and processing, you can get right down to the frames/individual processes, it's much more general in scope
Because Tops has more awareness of what each node is doing, it can do things that would be difficult with rops or standard renderfarm software. Eg:
- ensure things are run in parallel as much as possible
- never have tasks waiting unnecessarily
- use as many cpu cores as you have
- detect when tasks don't need to be re-run, skip over them
- when connected to a farm, allow complex control over many machines doing many tasks
- put a simple gui on simple operations
- provide a full python api for complex operations
- has built in support to control other apps, eg maya, nuke, ffmpeg
- has built in support to talk to asset management systems like shotgun
- has built in support to work with tractor, deadline, hqueue
Some of those features work great, others feel a little early and rough, but the potential is there.
Workitems vs points
If going from /obj to sops lets you play with points, going from rops to tops lets you play with workitems, the atomic unit of tops. At its simplest you can think of a workitem as a frame. If a mantra top renders frames 1 to 100, you'll see 100 workitems in the GUI, represented as little dots.
Workitems can be many things
Workitems are more than just frames of a render. In a similar way that a point in sops can represent a particle, a vertex of a shape, a packed primitive, an RBD object, a workitem could be:
- a frame of a render
- an entire image sequence
- the location of a mp4 on disk
- an asset name
- whatever you want it to be
Workitems can be collapsed into partitions if you need that, and expanded out again later into individual workitems. This is similar to packing and unpacking geometry in sops.
No really, workitems can be anything
It's worth emphasising how workitems can be much more than frames.
Everything in tops is a workitem, i.e. a unit of work, a process that is done.
It's intentionally generic nature means it can feel complex compared to Rops, but means it's capable of a lot more. A lot of pipeline tasks, even non vfx tasks can be handled in Tops, because ultimately you're not controlling 3d or geometry, you're controlling processes and tasks.
Generating vs cooking
More sops analogies! Watch some of the masterclasses about sops, there's a LOT going on that users aren't aware of. Sidefx folk talk about dirtying sop nodes, traversing bottom up through the graph setting states to understand what nodes need to be cooked, then working top down to do the node cooking. In other words, it first generates a list of work to be done, then it goes and cooks the nodes.
While Sops hides all this from the user, Tops exposes those generate+cook steps to the user, and gives you explicit control to separate them if required.
For simple graphs you don't have to worry about it, Tops will generate and cook automatically. For bigger graphs, you might have some nodes that could take hours to cook, maybe lock up all of a studios renderfarm. It would be hard to create or debug a Tops graph if you had to wait hours on every run!
If you can work with the generate step seperate from the cook step, you might be able to do quite a lot of work designing your graph without ever having to execute any nodes, sort of like keeping Houdini in manual/nocook mode.
To use this you can right click on a node and choose 'generate'. If the node is smart enough, it will generate workitems for you, which you can choose to cook, or just continue to the next node in this high level 'design' mode. When you're ready to both generate and cook a node, you can do this from the r.click menu, or with the keyboard shortcut shift-v.
Inputs and outputs
If tops nodes are to be chained together, they need to pass workitems betweeen them. Similar to Sops and point attributes, Tops workitems have workitem attributes. While Sop points have @ptnum and @P, workitems have an index, input and output defined. Broadly speaking, input is what the node will work on, and output is the finished result of that node.
In the gif above I run have a few nodes chained together. The inputs and outputs are as follows:
- Filepattern - makes workitems from a directory path, eg $HIP/myvideos/*.mp4
- Input - nothing (it doesn't have an input from a previous node)
- Output - a path to an mp4
- Ffmpegextractimages - given a path to a video, create an image sequence.
- Input - a path to an mp4
- Output - an image sequence
You can see these by double clicking on a workitem, the input and output will be listed. Note that if you generate rather than cook, you'll only see inputs (but sometimes you'll see an 'expectedoutput' attribute if the node is clever enough to guess it).
What if you need to get to those attributes in code? What if you want to define your own attributes? What if you want to use those attributes outside of tops?
The input and output are exposed as @pdg_input and @pdg_output, as well as a few other built-ins. They're listed here in the sidefx docs, but I've made a cheatsheet. Think of these like @ptnum, @P, @N, @v etc:
- @pdg_input - the input to the workitem
- @pdg_output - the cooked result of the workitem
- @pdg_id - the unique id, usually appears like a many digit hash, eg 25325
- @pdg_index - the 'public' index, closer to standard @ptnum 0, 1, 2 etc, but careful, its not guaranteed to be unique!
- @pdg_name - combo of name of the node and the workitem id
- @pdg_frame - the frame if you're doing frame based rendering, 0 if you're not
- @pdg_inputsize - the size of the input (eg if the input is an image sequence, the length of the sequence)
- @pdg_outputsize - the size of the output
These attributes are exposed to the rest of houdini, and are available in parameters wherever you'd use $HIP, $OS, $F, $T etc.
In the gif I use a file pattern top to find *.jpg in a folder. I can then use `@pdg_output` in cops, and the file cop will load the image from the workitem.
Because these are most often used in parameters, the parameter/hscript rules apply; integers and floats usually work directly, strings will usually need to be wrapped in backticks.
Tops vs standard workflows
So far so good right? But here's the big caveat, and where you should think twice before applying Tops to all your hips.
Imagine you hadn't read this page, didn't know Tops, and had been given a hip using the tricks so far. Would you be able to understand what's going on?
Worse, when loading a hip that uses Tops, the Tops graph is always loaded uncooked. That means no workitems exist, meaning no workitem attributes exist, meaning anything in the rest of Houdini using workitem attributes will be errored at best, or silently broken at worst.
Tops is not forgiving to people who don't know Tops. I mean sure, neither is Chops, nor Lops, nor esoteric Dops stuff, but at least with most of those you can save the hip in a working state, and others can load it in that state.
Another thing that can be confusing is the seemingly simple question of 'what is the current state of the graph?'
In the previous example, I used @pdg_output on a file node in cops. Fine, but what if I append a ffmpegencodevideo to make a mp4? If I click on a workitem in that node, cops can't load mp4s, surely the setup is broken? And what if I append a shotgun publish node, and now @pdg_output is a URL of a Shotgun query?
The answers become clearer over time, but ultimately this stuff is all New, and New can be Confusing.
And there's more concepts to deal with! Now that I'm out the other side of the learning tunnel, I feel its not as confusing as I first thought, but understand it's not just an instant drop in replacement for Rops.
Generate mode, automatic vs other
Short version: I mentioned generate vs cook as two seperate steps in Tops, and Tops will usually automatically do the right thing. If you start getting weird results, change the automatic mode ( 'generate when' to 'each upstream item is cooked'), things should start behaving.
The ideal tops node should be able to generate workitems without cooking. For example a Mantra Top can look at the frame range parameters, and know you'll need 100 workitems without needing to render.
There's cases where that's not possible. Eg a ffmpeg extract images can't look at the parameters to know how many frames are in the video(s) it will process; it has to actually run ffmpeg, extract the frames, count the result.
In these cases, the 'generate when' parameter tells tops not to try and predict the workitems. Even more accurately, it tells the node that it has to wait for the results of the previous nodes output before it tries to calculate its own workitems.
The chances of miscalculated workitems increases as Tops graphs get more complex. For example a graph might make temp folders, write results in those folders, count the number of files made in those temp folders, and do other things.
The 'count the number of files in the temp folder' node (a filepattern top) in its default state will try and generate workitems first, before upstream nodes have cooked. So it might find no files (the temp folder may not even exist yet in that first generate step), which will break results downstream. But then when you inspect the finished cooked graph, the folder exists (as now the upstream node to make the directory has cooked), the folder is full of files, and even if you recook the filepattern top now, you'll see it gets results and it all looks fine. Confusing!
Sidefx are trying to fix catch-22 situations like this, but its good to be aware that things like this can happen.
Short version: If multiple runs of your tops graph look the same, delete results on disk, try again.
Long version: Because Tops can get right down to that atomic workitem level, it can do some tricks that aren't possible in Rops or other systems. A big part of this is recognising when parts of the network have already been run, and don't need to be recooked.
The example I keep coming back to here of processing a folder of mp4s. Say you had the following chain of tops nodes:
- ffmpeg extract images to convert mp4s into image sequences
- fetch to to run a cops graph that processes those images, renders to a temp location
- fetch to run a geometry rop that traces the images
- fetch to a opengl rop
- fetch to a VAT rop
Obviously a lot of that only needs to run once if you're making minor changes here and there to the network, or if you add a new video, it should only procees that video and skip the rest.
As part of the cook process, wherever possible top nodes will check if output exists where it expects to write files. If it does, it will mark that workitem as complete, skipping a lot of processing time.
Of course this is great when you expect it, infuriating when you don't.
Most nodes have a r.click option after its been cooked, 'delete this nodes results from disk'. For the most part it does the right thing, and it will then of course force a recook of all the workitems on the next run.
If you just want a single workitem recooked, you can go on disk and delete whatever cache that is. I've found there's a r.click menu per workitem dot to get that atomic with your deletes.
Note that sometimes tops gets confused and won't delete files, or will delete too much, best to keen an eye on your files while doing these operations until you get a feel for it.
Caching, be aware of it, make it work for you.
Simple cache a sim then render locally
Download hip: File:pdg_basic_v01.hip
Most existing Houdini users want the basics from PDG; cache a sim to disk, run a render. Maybe chaser mode as a bonus? FFmpeg the result into an mp4, why not eh, YOLO!
Here's that setup. Click the triangle with the orange thingy on it to start. This was one of the first Tops setups I created, I've been told some things aren't ideal, like the map by index (apparently all mapping will gradually be phased out). One day I'll revisit this and clean up.
cache sim is a fetch top that points to a disk cache sop after a simulation. You DON'T want a sim cache running on multiple threads/machines, it should just be one job that runs sequentually. To do this enable 'all frames in one batch'.
map by index controls execution order and limits how jobs are created. If you have node A that generates 10 things, connected to node B that is also set to generate 10 things, PDG's default behavior is to generate 10 B things for each thing made by A. In other words, you'll get 10 x 10 = 100 total tasks. For situations like this, that's definitely not what you want.
The mapbyindex ensures tasks are linked together, so 1 frame of the cache is linked to 1 frame of the render. Further, it allows a 'chaser' mode, in that as soon as frame 1 of the sim cache is done, frame 1 of the mantra render can start, frame 2 of the sim cache is done, frame 2 of the mantra render can start etc.
mantra is another fetch top that points to a mantra rop.
waitforall does as implied, it won't let downstream nodes start until all the upstream nodes are completed. It also subtly adjust the flow of tasks; the previous nodes have 48 dots representing the individual frames, this node has a single rectangle, implying its now treating the frame sequence as a single unit.
ffmpeg top needs some explaining (and some adjustments to the fetch top that calls the mantra rop), which I explain below.
Note that the frameranges on the fetch tops override the ranges set on their target rops by default.
Also note that the button with the orange thingy on it kicks off the output, looking for the matching node with the orange output flag. See in that screenshot how I've left it on the mantra node? That means it'll never run the ffmpeg task. I'm an idiot.
Wedging a rock generator
It was either this or a fence generator, lord knows we need more tutorials on both these important things.
Here's the hip before getting into tops if you want to follow along:
In this hip is a straightforward rock generator. The sops flow is
- high res sphere
- some scattered points with random scale and N+up
- copy spheres to points
- point vop to displace spheres with worley noise
- attrib noise for more high frequency detail
- attrib noise for colour
- Cd converted to HSV and back again to roughly match the colour of the env background.
So with this all setup, we could randomise a bunch of stuff with tops.
The first thing we'll do is wedge the number of points in the scatter. We'll create a wedge top, which will make a pdg attribute we can reference on the scatter.
- Create a topnet
- Create a wedge top
- Set the wedge count to 5, so we get 5 variations
- Add a new wedge attribute with the multilister
- Attrib name scatternum
- Attrib type Integer
- Set start/end to 2 and 6, so we'll generate a minimum of 2 scatter points, a maximum of 6.
- Shift-v on the node to cook it and see what we have so far.
Middle clicking on each workitem, we can see that each workitem has a scatternum attribute, starting at 2 and ending at 6. That might be useful for other things, but here we don't want it to be gradually increasing, we want it to be a random integer between 2 and 6. Enable random samples, cook, look again.
That's better, random samples for each workitem.
To use this in the scatter sop, all we do is type @scatternum into the scatter force total count parameter, and bam, its connected.
Add more entries to the wedge multilister, fill in parameters over in sops, bosh, you have a wedge setup. Note that when you create vectors, you access the components with @attrib.0, @attrib.1, @attrib.2.
Eg here I create a noiseoffset wedge, and drive the point vop noise offset with it.
So that's all wedging, how do we write this out? We could use a disk cache sop, and set the output name to use @pdg_index, which corresponds to the id of each workitem (ie, 0 to 4 in this case of 5 wedges). Or you could use a rop geometry output top which basically does the same thing.
- append a rop geometry output top to the wedge
- set the sop path to the end of your sop chain, /obj/rocks/OUT_ROCK in my case
- set the output file parameter to make a unique file per workitem, eg $HIP/geo/$HIPNAME.$OS.`@pdg_index`.bgeo.sc
- Now if you click through the workitems in the previous node, you can see the file path change if you mmb on the label.
- cook the node, and it'll be baked to disk.
Here's the finished hip:
Download hip: File:tops_rockgen_end.hip
Wedge a sim and generate a fancy QC mp4
Download hip: File:top_wedge_sim.hip
Ooo, combinations of attributes, captions on the render, an mp4 to comfortably view on the couch, this the houdini dream! And not too hard to setup either.
While the previous example used a single wedge top to do all the randomising, in this case we'll use 2 wedge tops. The first will make 5 workitems, the second will make 5 workitems for each incoming workitem, so 25 in total.
Here's the popnet we'll be working with:
I've found when setting up top wedging its handy to have the topnet visible in one panel, and the popnet (or whatever) visible in another panel. By using the P hotkey within each pane to toggle the parameters, and making sure each network is pinned, you can alter both networks pretty easily.
I want to wedge the popdrag air resistance, and the popwind noise amplitude. I make a wedge top, set wedge count to 5, create a tops_drag_air_resist attribute, and type @tops_drag_air_resist into the relevant field in the pop sim.
I connect another wedge top and repeat the process for tops_noise_amp. Cook the node, you'll see 25 workitems created, if you step through them you'll see each combo of the 2 attributes being set on the sim.
Now to render this. Move over to rops, create an opengl rop, set the camera. The output files need to be unique per wedge combination, otherwise competing workitems will start overwriting results. A simple way to fix this is to make @pdg_index part of the output filename. Because this is a string parameter, it needs to be wrapped in backticks:
Don't bother setting the framerange, we'll do that from tops.
In tops append a fetch top, point it at the opengl rop you just made. Set Evaluate Using to 'Frame Range', set the frame range you want and enable 'Cook Frames as Single Work Item'. This means the workitems will end up being an array of the rendered images per wedge combination, rather than generating a workitem for each individual frame.
Cook now, you'll see renders start, and folders fill up with images. The last step is to make an mp4 out of these image sequences.
Append a partition by node top, cook it. You'll see all the workitem dots are now represented by a single rectangle. If you double click it and browse the output, you can see its an array of all the images from all the workitems:
That's the perfect array of images to feed to a ffmpegencodeimages top. Do that, cook, hey presto, a mp4 of all the output.
Buuut.... its just lots of particle variations, there's no way to tell what wedge values correspond to what render. We can fix this by using the comment feature on the opengl rop.
Go back to rops, bring up the parameters for the opengl rop, and in the scene tab go down to viewport comment. This can be used to create a text overlay on the render, so if we use tops attributes here, we can read that in the final renders. R.click on the parameter and choose 'Expression->Edit expression', and combine text and attributes to design a useful overlay:
noise amp: `@tops_noise_amp` air resist: `@tops_drag_air_resist` frame: $F
Except... it won't work. When you hit accept on the editor, it makes the parameter green, assuming its an expression, and the expression isn't valid hscript. Ugh. Luckily hscript is pretty forgiving, all you need to do is wrap the whole thing in double quotes:
"noise amp: `@tops_noise_amp` air resist: `@tops_drag_air_resist` frame: $F"
Check it! Fancy wedge like the grownups do!
Mosaic of all the things
Download hip: File:top_wedge_sim_mosaic.hip
Mosaic, or contact sheet, or montage as its strangely called within Imagemagick, the ultimate way for a time pressed fx lead to review their teams work and say 'I hate it all, start again'.
That I could do this without reading the docs surprised me, taking that as further proof that I'm starting to understand Tops.
The Imagemagick node lets you do image processing. One of its tricks is a mosaic effect, which they call 'montage'. If you give it an image sequence of frames 1 to 10, you'll get those 10 frames laid out together in a grid.
That's almost what we need here, but rather than giving it frames 1 to 10, we want it to grid together all the frame 1s from all the wedges, then all the frame 2s, frame 3s etc... once we have all those written to disk, we can use ffmpeg to combine them into an mp4.
Picking up from the previous example, the fetch top has 25 wedges of 50 frames. We'll use tops to transpose these results into 50 frames of 25 wedges. Think of all the results laid out on a grid. The fetch/opengl top has the results grouped by rows, I need them grouped by columns.
I've done it in a kind of roundabout way:
- copy @pdg_index to a new attribute, @wedgeindex
- workitemexpand top so that we go from 25 items of 50 frames, to 1250 individual frames as workitems
- the frame number isn't stored per workitem, so create a frame attribute by parsing the file path on each workitem
- partitionbyattribute using @frame, and on distinct attribute values. This partitions all the frame 1 images together, all the frame 2 images etc
- The results come back slightly out of order, but on the advanced tab, we can tell the partition top to sort by the @wedgeindex we created in step 1
- Imagemagick top, montage mode, this makes a mosaic from each partition. Because we have all the frame 1's in a partition, they all get assembled into an image, as do frame 2, frame 3, frame 4...
- waitforall (or partitionbynode), basically just get it to make a single array of all the images so that we can...
- ffmpegencodevideo to make an mp4
Tops and sims further work
Looking at the result of the last 2 examples, there's a few things to note:
Nice names help discoverability
I feel sidefx made pdg attributes too subtle, too easy to hide. At a glance they have @ prefixes just like regular attributes, parameters using tops attribs turn green like regular expressions. In my world, they'd use a different prefix ( # maybe? Or ~, ie a tilde for tops? Or t@foo instead of @foo ?), and expressions would go a different colour when driven by tops attributes.
Until Sidefx see reason, you can help yourself and others by naming your attributes nicely. Don't use @noiseamp, use @tops_noiseamp. See? Now it's clear it's from tops!
Fetch top Frame by frame vs complete sequence
If you run a regular sim through a regular rop without caching the sim first, Houdini will sim each frame as required, then render. If you send that rop to a render farm, and the farm splits the job into one frame per render blade, then each blade will have to sim from frame 1 to the frame it needs to render. Not very efficient.
The better option would be to cache the sim first, then have each blade render the cache. Tops is no different. If this were a heavy sim, I would put a disk cache sop after the popnet, use a fetch top to execute it (making sure the cache had a wedge index so they don't overwrite each other!), and then I could run the opengl fetch top in whatever mode I want.
Workitems as frames vs workitems as image sequences, general efficiency
Related to the above, if I were rendering from a cache instead, then I could have the workitems render individual frames rather than the entire sequence. But think about this; each workitem is having to spin up its own instance of houdini, the opengl rop, render, then shut itself down again. For a simple render like this, it's probably doing more work just to startup and shutdown than to actually render. It's probably more efficient to startup, do the sequence, then shutdown. At a big scale this would also be impacted by licences; each tops graph in a render farm would consume a license per blade, you'd quickly eat up all your licences in a churn of blades starting and shutting.
Workitems as arrays elegance vs workitem busy work later on
This setup looks neat early on with with the fetch, partition, ffmpeg top to make a sequenece of renders, but gets ugly when making the mosaic as I have to do lots of work to make attribs, expand, partition, sort.
Alternatively I could have run the fetch to make a workitem for every frame (if I cached the sim first), which would mean I could directly partition the results by frame. Neater, but now potentially less efficient.
I don't know if one way is more elegant than the other (and I'm assuming there's probably a better way to make the partitions), but it's something to keep in mind.
Debug workflow with cached results, generate vs cook
I'm getting better at predicting when tops needs to do work, and when I can take shortcuts. I know that once the fetch top has run the opengl rop, any of the downstream nodes that are just manipulating workitems and attributes are basically instant. It's only when nodes have to do work (ie the ffmpeg and the imagemagick nodes) that I'd get a processing time hit.
Hence a lot of the debugging was appending nodes after the opengl fetch top, cook, inspect a result, frown, try another node, cook, inspect, grin, append a new node, cook that, tweak it, recook, inspect, etc. Tops is more efficient to work with for debugging workflows than I first gave it credit for.
Specifically for the debugging, the actual workflow is to say append a node, cook it, double click a workitem dot, look at the output. If its an array of frames, expand that to see if its doing what I expect. Eg for the mosaic workflow I could see it was collating all the frame 1s, frame 2s etc together in the partition sop, but the results weren't sorted:
That's how I knew I had to store the workitem index so I could sort it properly later. It's not quite as fluid as using the geometry spreadsheet in sops, but it's not bad.
Even when hitting the heavy 'do work' nodes, tops will often aggressively lean on the cached results on disk, and rarely do work if it doesn't need to. I'm trusting it more and more to do the right thing, and generally know it'll only start doing heavy lifting if I right click on a node and select 'delete results on disk'.
Tops vs PDG?
They're essentially the same thing. PDG stands for Procedural Dependency Graph. You could get really pedantic and say that TOPs, or Task Operators, are the nodes within the Procedural Depdendency graph.
You might call also Sops a procedural geometry graph (PGG) and Cops a procedural compositing graph (PCG), but you don't, you call them Sops and Cops. I think using PDG as a name is needlessly confusing (moreso with the builtin @pdg_ prefix on attributes), so I'm trying to say Tops whenever I can.
The help mentions some stuff in the task menu, I couldn't see it.
Well, it's there. Hidden in plain sight.
What do the colours mean in the mmb info for workitems?
Colours represent the attribute types.
- dark cyan - internal attributes
- pink - string
- green - integer
- yellow - float
When pdg_input and pdg_output are blank
In summary, check the nodes above the erroring one, especially if they have an unchecked copy inputs to outputs toggle.
I've had a few occasions where I'll have a node error, go inspect a workitem and see that @pdg_input or @pdg_output is missing or empty. The culprit is always an upstream node. Related to what I mentioned before, nodes usually expect an input, and most of the times set an output.
This isn't mandatory though! If you insert nodes into an existing network, those node might not get the input nor set the output, always check their middle click info panel to see what going on.
- Wedge tops if appened to a filepattern top will have inputs and outputs, but wedge nodes by themselves don't, even if chained together.
- Generic processor tops won't generate output by default, unless you enable 'copy inputs to outputs'.
Set a limit on the number of workitems
Say you have a folder full of images that you want to process, but for testing just want the first 5 images.
A filterbyrange top will let you do this.
Pick 2 items from each group of workitems
I have a filepattern searching an inputs folder. That folder is full of subfolders, say animal names, and in each of those are mp4's I want to process. So the folder structure might look like this:
Now say while testing I only want to grab 1 or 2 from each category?
A partition top lets you collate workitems, sort of like packing geometry. You can partition in many ways, here I'd use a partition by attribute top. I'd use an attrib from string top to split off the animal type into its own attribute, @animal, then partition by 'animal'. Make sure 'partition attributes independently' is set.
Now we can unpack them again using a work item expand, but handily this has several ways to do that unpack. In this case 'first N', sounds right, so we can expand only the first 2 from each animal.
On this node, make sure apply expansion to: is set to 'items in upstream partition', otherwise it can do odd things like duplicate the first item it finds per animal twice. Remember @pdg_index isn't guaranteed to be unique in the way @ptnum is, so when relying on tools to sort by index, it can easily grab stuff you don't expect.
Finally to make sure the output is sorted per animal, a sort top can be used, with the name parameter set to 'animal'.
Make workitems from a string array
Download hip: File:tops_animals.hip
Or the alt title 'make a bunch of fake animal mp4s from some text cos I'm too lazy to make them manually'.
The previous example used to have some quoted text where the screenshot of the files go, like this:
/inputs/dog/poodle.mp4 /inputs/dog/pug.mp4 /inputs/dog/terrier.mp4 /inputs/dog/labrador.mp4 /inputs/cat/black.mp4 /inputs/cat/striped.mp4 /inputs/cat/white.mp4 /inputs/cat/ginger.mp4 /inputs/cat/longhair.mp4 /inputs/bird/parrot.mp4 /inputs/bird/gull.mp4
I thought a screenshot of files+folders would look better, but how to make fake mp4s with all those locations quickly? That itch to eat my own dogfood was strong, so I figured there must be a tops way. Here's how:
- Attribute Create, name 'test', paste in the text, which appears with newline symbols ( ¶ ) and spaces.
- Attribute From String to split @test into an array using a space as the delimeter. This creates a new attribute @split, which is a string aray of the filepaths.
- Work Item Expand, expanding on 'upstream attribute', and the attribute is 'split'. Now I have 12 workitems, each has @expandvalue set, which is the name of the fake mp4.
- Text Output, set the path to $HIP/animals/`@expandvalue`, cook it, hey presto, files on disk!
Note that the first workitem errors. The first split element is an empty string, so the Text Output node complains that it can't create that file. I could limit by range and remove the first workitem, or use a rule or something to remove it, but I have my fake mp4s now, no need. :)
workitem @pdg_index isn't unique like @ptnum
For a while I was treating the workitem @pdg_index like @ptnum, in that I assumed it was always unique, would resort and reindex as you add and remove workitems.
I've since found that's not the case. If you do things like merge different streams, or do partitioning and unpacking, you could have 5 workitems that all have a @pdg_index of 0. That means if you do things like say limit by range to 0, thinking you'll just get the single zeroth entry, you'll get back 5.
A bit confusing, you have to use things like sort tops and other nodes to force workitems to reindex, I'm still working out the best method for this.
Get framecount from ffmpeg extract images
The ffmpeg extract images node usually generates a output attribute which is an array of the images its created. At some point this stopped working for me, so I had to find another way to count the images.
After talking to support it was due to putting quotes around the output parameter path. Doh. Still, leaving this here as it's still a good mini example of tops workflows.
So as I said, we could try and find the files in the ffmpeg output folder for the node. A filePattern top can do this. If the ffmpegextractimages node is using the default
The following filepattern node should point to the folder above it, ie
It also has 'split files into seperate items' turned OFF, so that way I just have a single workitem per image sequence.
If you try this now, it won't generate any workitems. The default generate mode will mean it tries to look in that folder straight away, finds no images, and as such returns no workitems. Change the generate mode to 'each upstream item is cooked', then it works as expected.
Ok great, but where's the actual number of frames? It's there, annoyingly hidden. There's some tops attributes that don't appear in the middle click info, one of those is @pdg_outputsize. In this case, unsurprisingly, it returns the amount of frames in the sequence. So with an attribute create node, you can create a new integer variable called framecount, and set an expression to use @pdg_outputsize.
Note that you don't need to change the generate mode on the attribute create. As soon as any upstream node is set to be dynamic (ie, it has to wait for previous items to cook), all subsequent nodes are also made dynamic.
Create attribute from string array element
Related to the previous example, I used an attribute from string node to split a directory path into components, and then wanted to create a new attribute based on the last part of that path. Annoyingly I couldn't work out how to get an array element from the standard attribute create node, so I gave up and used a python script node instead:
renderpass = work_item.attrib('split')[-1] work_item.setStringAttrib("renderpass", renderpass)
Had another go, here's a purely node based approach:
Tops array attributes can be accessed with @attrib.0, @attrib.1, @attrib.2 etc, but there's no python style @attrib.-1 to get elements from the end of the list. As a workaround we can simply reverse the array. So:
- attribute from string top, split by delimiter enabled, '/' as the delimeter. A @split attribute has been created, an array of the file path components.
- attribute array top, update existing 'split' attribute, reversed enabled
- attribute create top, uses `@split.0`
Ffmpeg and non sidefx rops
I had a few issues getting ffmpeg to make mp4's from a renderman rop. In the end the fixes were relatively straightfoward.
The ffmpeg top needs to know a few things:
- what part of the upstream nodes are making images, set with output file tag
- what the location of those images are, set with output parm name
- that the images are bundled into a single unit of work, using a waitforall top.
Top nodes can tag their output(s), in this case the ffmpeg top expects the images to have a 'file/image' tag. On the fetch top for renderman rop enable 'output file tag' and use the dropdown to select 'file/image'
To know what file name to put in that tag, enable 'output parm name' and set it to 'ri_display_0'. This is the parameter on the ris rop where the image path is set.
To bundle all the frames into a single unit, use a waitforall top.
A last specific thing for our setup, our build of ffmpeg didn't understand the '-apply_trc' option, so I disabled it.
Force python scripts to run on the farm
If you have a python script node, even if you have a tractor or deadline scheduler, it will run in your local houdini session by default.
To fix this, turn off 'Evaluate in process'.
Ensure a rop geometry top sim runs on a single blade
You don't want 240 machines all doing their own run up, that's silly. Go to the 'rop fetch' tab, enable 'all frames in one batch', that'll lock it to a single blade and run sequentially.
Tractor scheduler stuck
too often less often after some fixes from sidefx. Tricks to unstick it in order of least to most annoying:
- Make sure there's no active stuck jobs of yours on the farm, delete 'em all and try again
- R.click on tractor scheduler, 'delete temp directory'
- Select the tractor scheduler, ctrl-x to cut it, ctrl-v to paste it
- Reload the hip
- Restart houdini
- Quit FX
- Quit the industry
Rez and tops debugging
Running this in a python script top to see whats going on with rez and environment values:
print('debug info...') key = 'REZ_RESOLVE' print(key+'='+os.environ[key]) print ('') import pdg has_tractor = str(pdg.types.schedulers.has_tractor) print('pdg.types.schedulers.has_tractor: ' + has_tractor) print ('')
Mostly works, the long story can be found below, but here's the summary:
- Your environment needs access to the python tractor api. If you use rez, make sure to bring in a package for tractor.
- PDG assumes it'll find $PYTHON set correctly. We didn't have this, but even then I found I couldn't use the regular system python, but had to point it to hython ( $HFS/bin/hython )
- If your farm is behind a firewall, make sure your IT department chooses 2 ports you can use, and enter those ports into the callback and relay port fields on the tractor scheduler
- As of 18.0.502 retry support exists on the tractor scheduler, as well as options for better logging.
- Cooking jobs by default expects to connect to your desktop machine to update information, give you blinky lights and dots. This means that if you close your houdini session, the job will stop working on the farm. Call me old fashioned, but that defeats most of the point of using a farm. If you don't want this, use the 'submit graph as job' option at the top of the tractor scheduler, and it will run independent of your GUI session. Getting these to work reliably was problematic for us, YMMV.
Tops and tractor diary
Moving the diary to TopsTractorDiary.
Some neat ideas for examples/topics from Jeffy Mathew Philip:
- converting between one format to another(like ingesting a kitbash set and spitting out individual alembics, or fbxs or usds) - like building an asset zoo.
- example running the hda processor for poly reduction or geo processing
- using the invoke top for geo processing(can use the top geometry sop)
- more uses of the partitioning tops, eg partitioning a wedged -> substepped simulations to give me the wedges, and then I combine at the end to use the partition by frame to run the correct set of frames