Superfeather Engine

Introduction

Superfeather is a framework designed to help you create fast and colorful 16-bit games for bsnes/higan, snes9x, and of course real hardware. It includes functionality common to many games, such as a gameloop, object handling system, and sprite management. The three core principles behind the framework are performance, flexibility, and convenience.

In practice, these principles sometimes overlap - for example, the idea that game object code only use as much functionality as it needs falls under both flexibility and performance. Sometimes, however, these principles may conflict with one another. If a subsystem is designed around performance but is not very convenient to use, that makes it a candidate for being reworked to fit both of these principles.

Features

The following features are currently implemented:

These features may show up in a future release:

Prerequesites

Before you get started, you will need the following:

On Windows, the necessary binaries to run the build script should already be inside the superfeather-bintools-win/ folder. On Linux, the tool binaries reside in superfeather-bintools-lin/ (x86_64 architecture required to use these). If the pre-built binaries don't work for you, download and/or build cc65 yourself, and then place (or symlink) the ca65 and ld65 binaries within this directory, replacing the old ones. You should now be able to run the build script to generate a working ROM.

Coding Conventions

You should get acquainted with the following practices before you dive in - some of them are quite important and familiarizing yourself will save you a lot of headaches later:

In general, you should be using a 16-bit register size for the accumulator and index registers. This is a 16-bit system, so you should be writing 16-bit code where appropriate. You should only ever set the registers to 8-bit where you need to do 8-bit writes, or are otherwise doing arithmetic that (only) needs 8-bit registers. Many performance-critical functions will expect that you have the registers set to AXY16 beforehand, in order to avoid the overhead of needing to preserve the status register.

Some functions (such as AddSprite and CreateObject) will return a boolean value indicating success of the operation. The status register's less commonly used overflow bit (V) is used for this purpose. It is important that you handle failure scenarios by branching on the overflow flag (bvs, bvc). If overflow is set, the operation failed. If the failure case is not handled, you may (will) introduce errors in your program.

For object-oriented functions, it is customary to have the X index register set to the index of the current object. Many functions that operate on functions will expect X to be set in this manner, so that there is no need to move this value to different registers in order to perform multiple functions on the same object.

The direct page register (D) is used as a third index register. Don't assume that it will always be set to 0.

Be aware of the requirements and side effects of calling into functions. With the exception of interrupt handlers, many functions will leave some CPU registers altered, especially if they are functions that are called frequently. This information is documented inside the source files containing these functions.

A number of macros are provided for your convenience. Code macros begin with Macro_ to differentiate them from subroutines. Data macros begin with Define_ and are the preferred way to define some of the data structures used in the framework.

Several code templates are included in the templates/ folder. Use them as a starting point for your own code.

Memory Layout

Generally, games on the SNES use either LoROM or HiROM. Superfeather uses HiROM but with LoROM for the upper 32kb of each 64kb code bank. The memory layout looks like this:

xx0000 - xx7fff xx8000 - xxffff
Bank 0 Code 0 Lo Code 0 Hi
Bank 1 Code 1 Lo Code 1 Hi
Bank 2 Data 2
Bank 3 Data 3
... ...

Basic Use

Memory Bank Details

Most engine routines use the upper half of code bank 0 (Code 0 Hi). Therefore, code that is called frequently or calls the engine routines frequently should be in that same bank. Code that is large but isn't called often (e.g. your player character thinker) would be a good candidate for a different bank in order to save space in Code 0 Hi. Below is a table comparing each of the different code banks:

Code 0 Lo Code 0 Hi Code 1 Lo Code 1 Hi
Bank used c0 80 c1 81
Address range c00000 - c07fff 808000 - 80ffff c10000 - c17fff 818000 - 81ffff
Can access RAM + Registers Direct page/long only
No function pointers
Yes Direct page/long only
No function pointers
Yes
Can access Code 0 Lo Yes Long only Long only Long only
Can access Code 0 Hi Yes Yes Direct page/long only Direct page/long only
Can access Code 1 Lo Long only Long only Yes Long only
Can access Code 1 Hi Long only Long only Yes Yes

For code banks marked "long only" or "direct page/long only", absolute access of data is available if the data bank (DB) is changed to the appropriate value.

Indirect (function pointer) addressing via RAM is not available when running code in Code 0 Lo or Code 1 Lo. This is because the indirect access is done in the program bank rather than the data bank, and thus reads the pointer out of ROM instead of RAM.

The provided memory configuration targets 256 kB (2 megabit) initially. Of course you will likely want to use a larger size based on your project/assets. You can add more banks (or customize existing ones) via the ca65-map.cfg file in the project root. If you change the ROM size, you must remember to update the header in header.asm.

The Gameloop

In any game engine, the gameloop forms the skeleton of logic that goes into each frame. Below is a basic timeline of what happens in one frame:

When VBlank occurs, if the frameReady flag is set, the display is updated:

If VBlank occurs before the frame is finished (frameReady is still clear), these steps are ignored as it would otherwise result in a partially-updated display.

Moving Between Scenes

To switch scenes, set the sceneInitFunc and sceneThinkFunc function pointers, and then set switchScene. The scene switch will occur after the current frame of the gameloop. As for the functions themselves, your init function should do the following:

There is a long function, SetupPpuRegisters, which takes a pointer in index register X to a PpuDef data structure, which you can define via the Define_PpuDef macro. This allows you to easily initialize the registers to the values you want for your scene.

Your scene think function should do the following:

Game Objects

Game Objects (or just Objects) are the living, breathing entities that reside in your game, including player characters/objects, enemies, NPCs, bullets, items, gimmicks, etc. etc. You get a fixed number of slots for objects - meaning you can have up to 128 objects by default, but this limit can be changed at assembly time. Of course, you do not need to use this system to keep track of all objects in your game - you might have objects that keep track of their own subobjects, effectively bypassing this system. However, for dynamically creating and destroying objects, you will want to make use of these slots so you can take advantage of Object Pools (see below).

Each object has a pointer to a thinker function which runs object logic every frame. This pointer resides in the object's objThinker variable. Empty object slots are implemented by setting their thinkers to point to a function that immediately moves to the next object. When creating your object, make sure you set objThinker to point to the address of your object's thinker function. Other object variables may be set optionally.

At the end of your thinker function, you may use rts to move to the next object. However, it is faster to use the Macro_NextThinker macro, as this advances iteration without needing to perform any stack operations. Each use of the macro expands to six bytes, so if you're running low on ROM space you might consider removing this macro from less performance-critical thinkers and replacing them with rts, but otherwise you should use the macro wherever you have the chance.

Before you pass control to the next object, you must make sure that the X index is the same value as it was when you entered the current thinker. If you need to use X for something other than the index of the current object, remember to restore it to its previous value when you're done. You should avoid incrementing X yourself (outside the provided macros) unless you know what you are doing, or else you will skip over objects and/or crash the game.

Due to the memory layout, object indexes are multiples of 3. An index value of 0 means the first object, and a value of 3 means the second object, 6 is the third, etc. This is to support 24-bit position values: the format is fixed point with 16.8 precision. The low byte is the fractional value, meaning there are 256 subpixels available per pixel. The mid and high byte are the integral part, meaning a range from 0-65535 whole pixels is possible. Velocity values are 16-bit signed with 8.8 fixed point precision, meaning that the possible speed range is between -128 and 128 whole pixels.

The memory layout looks like this (assuming 128 objects):

Object index offset Low byte Middle byte High byte
0 (Object 0) X position (fractional) X position (integral, low byte) X position (integral, high byte)
3 (Object 1)
...
381 (Object 127)
0 (Object 0) Y position (fractional) Y position (integral, low byte) Y position (integral, high byte)
...
381 (Object 127)
0 (Object 0) Custom use X velocity (fractional) X velocity (integral)
...
381 (Object 127)
0 (Object 0) Custom use Y velocity (fractional) Y velocity (integral)
...
381 (Object 127)
...

To move an object, set its objVelX and objVelY to the desired values and then call MoveObject. To have the object draw a sprite, set the Y index to point to the address of the spritedef you want drawn, then call AddSprite. More on that in the section below.

Below is an example of a simple thinker which causes the object to fall as if affected by gravity:

FallingBallThinker:
    lda objVelY, x
    clc
    adc #$40
    sta objVelY, x
    jsr MoveObject
    Macro_NextThinker

Sprites and Animation

In Superfeather, objects are decoupled from sprites. The appearance of an object is entirely up to its code, and this includes no appearance at all, or a single object that is made up of many separate sprites. This behavior is generally handled in the object's thinker function and may be customized based on features desired. An example is below:

MyBallThinker:
    ; Object processing goes here
    jmp Jmp_AnimateThenDrawThenNext

The above function actually does three things: First, it processes any animation, then it calls AddSprite to draw the resulting sprite or metasprite. Finally, it iterates to the next object in the list.

The spritedef is a data structure that provides information on the hardware sprite being drawn, and is created using the Define_SpriteDef macro:

;                     Name,         X pos, Y pos, sprite size,  tile, vhoopppN, next sprite
Define_SpriteDef      MySpriteDef,  $0000, $0000, SPRITE_SMALL, $22, %00100000, 0

Name is the name of the label for this spritedef. X pos and Y pos are where to draw the sprite relative to the object's position, with positive values going right and down, and negative values going left and up. Sprite size may be either SPRITE_SMALL or SPRITE_LARGE. Tile and vhoopppN are the tile index and sprite flags to use, copied into OAM: vh indicates vertical/horizontal flip, oo indicates sprite priority, ppp is the palette index to use, and N is the nametable to use.

The last parameter, next sprite, is used if you want metasprite functionality. A metasprite is a logical sprite that is made up of several smaller hardware sprites. The next sprite value, then, is used to form a linked list of all the hardware sprites used in the metasprite. If the spritedef is the last one in the metasprite, or if the spritedef is not part of a metasprite, use a value of 0. Otherwise, use the label of the next spritedef in the chain.

So how do you create an animation? You use Define_AnimDef:

;              Name,          delay, next anim,     sprite/callback
Define_AnimDef MyAnimDef1,    5,     MyAnimDef2,    MySpriteDef
Define_AnimDef MyAnimDef2,    5,     MyAnimDef1,    MySpriteDef2

This defines two animation frames, called MyAnimDef1, and MyAnimDef2. These two make up a (simple) looping animation. The delay value is the number of tics to spend on this frame before moving onto the next. Note that a value of 0 will still cause the animation to play, but the frame will last for 256 tics. Next anim is the label of the next animation frame in the sequence. Sprite/callback is a label for related data, based on which animation function you use. For Jmp_AnimateThenDrawThenNext, this is the sprite you want drawn.

The other function for animation is AnimateWithCallback. Here, the sprite/callback parameter is a label to a function to call once the animation lands on this frame. This can be used for a variety of effects, such as choreographing attack sequences, or playing sound cues. Note that this function does not draw the sprite itself. An example is below:

;              Name,          delay, next anim,     sprite/callback
Define_AnimDef MyAnimDef1,    5,     MyAnimDef2,    0
.word .loword(MySpriteDef)
Define_AnimDef MyAnimDef2,    5,     MyAnimDef1,    PlayMySound
.word .loword(MySpriteDef2)

MyBallThinker:
    ; Object processing goes here
    jsr AnimateWithCallback
    ldy objAnimDef, x
    lda ANIMDEF_DATAEX, y
    tay
    jsr AddSprite
    Macro_NextThinker

You might have noticed that this example defines additional data after each animdef. In this case, it is the spritedef we want to draw. We can access this later by loading the animdef pointed to by this object's objAnimDef into the index register, then using the ANIMDEF_DATAEX constant to load the data directly after the animdef. This gives us our spritedef, which we then use AddSprite with.

This system is flexible, as it allows you to associate multiple possible spritedefs with a given animation frame. For example, you could use this to provide sprites for when a character is facing different directions, without having to use separate animations for them.

Low Level Sprites

While this animation system works for a variety of objects, if you need a simple static sprite or animation for a large number of objects (eg. bullets), you will save on processing time by instead using drawing code like this:

MyBulletThinker:
    ; Object processing goes here
    ldy #.loword(MySpriteDef)
    jsr AddSpriteFast
    Macro_NextThinker

AddSpriteFast is a variant of AddSprite that removes support for metasprites and dynamic attributes for a small performance boost.

Object Pools

In order to create and destroy game objects on the fly, there must be some facility to pick an unused slot which can be allocated to a newly created object. Object pools are Superfeather's solution to this problem. These pools function as lists (queues) containing all unused slots so that they can be located and assigned with minimal overhead. When creating an object, a slot is popped from the head of the queue. When deleting an object, the object's slot is pushed back into the queue to be reused at a later time by another object.

You may have up to eight object pools in use at a time. Each one may cover a different range of slots. However, these ranges must not overlap. Use of multiple pools allows you to allocate separate regions for different kinds of objects and iterate over them efficiently. For example, you could decide to have a pool dedicated to enemy bullets, and in the player's collision routine iterate only over these objects.

During scene initialization, you should set your object pool configuration by calling SetObjectPools with X pointing to a word array of the following format:

Pool 0 start Pool 0 end Pool 1 start Pool 1 end ... Pool 7 start Pool 7 end

To create an object, call CreateObject. To delete an object, call DeleteObject with X set to the index of the object to delete. For both calls, Y must be set to the index of the object pool you would like to use.

To avoid corruption of the object pool, follow these guidelines:

DMA Transfers

The engine includes a simple DMA manager that allows you to queue up transfers to run during the next VBlank. More advanced management functionality will be added in a future update.

To use, load the size of your transfer into the accumulator register, and then call into DMATryAdd. If there is enough VBlank time to perform this transfer, the transfer will be scheduled, and you should set the dmaTableSrc, dmaTableSrcBank, and dmaTableDest variables via X index. Otherwise, the CPU overflow bit is set. Here is a usage example:

    lda #$100
    jsr DMATryAdd
    bvs _DontDMA
    
    lda #myTilemap
    sta dmaTableSrc, x
    lda #$80
    sta dmaTableSrcBank, x
    lda #$7800
    sta dmaTableDest, x

If you want to load 8-pixel-tall strips of graphics into VRAM to represent a 32-pixel-tall sprite, look into using the DMATryAddStrip4 and DMAMakeStrip4 functions:

    ; The size of a single strip
    lda #$80
    jsr DMATryAddStrip4
    bvs _DontDMA
    
    lda #myCharData
    sta dmaTableSrc, x
    lda #$80
    sta dmaTableSrcBank, x
    lda #$0000
    sta dmaTableDest, x
    jsr DMAMakeStrip4

Audio

Superfeather uses SNESGSS as its audio driver. The package includes the driver and a tracker for authoring audio content. See the SNESGSS documentation for more information on that.

At the beginning, you will need to upload the audio driver to the SPC via the Macro_SpcUpload macro. This is done for you. Afterward, you may begin sending commands via the Macro_SpcCommand macro. To upload a song, make sure the SPC is in "upload mode" (this can be done with the SPC_CMD_LOAD command), and then use Macro_SpcUploadMusic.

To play sounds, simply write 8-bit values to the following variables: soundAId, soundAPan, soundAVolume, soundBId, soundBPan, and soundBVolume.

Before you can build your game, you will need to export a symbol named SpcDriver pointing to the audio driver in ROM. See the next section for details.

Content Creation

Superfeather currently includes three tools to assist in creating content for your game, each with its own purpose.

GFX2SNES

This third-party tool can be used to convert common image formats into raw data that can be used on the SNES. Example usage for a standard 16-color-per-tile background:

../superfeather-bintools-lin/gfx2snes -n -gs8 -m -pc16 -fbmp data_src/tree.bmp

This will generate three files. tree.pal contains the raw palette data to upload into CGRAM. tree.pic is the raw char data to upload into BG/sprite char region of VRAM. tree.map is the tilemap to upload into the BG tilemap region of VRAM.

If you just want to generate a tile/sprite sheet and don't want any tile optimization, this usage is more appropriate:

../superfeather-bintools-lin/gfx2snes -n -m! -mR! -gs8 -m -pc16 -fbmp data_src/dragon.bmp

This functionality will likely be handled by a new tool in a future release, once scrolling level support is available.

Heartcore Editor

Written in LOVE2D, this custom tool is useful for checking and tweaking the output from GFX2SNES. It currently has three editors which focus on viewing and editing raw SNES data. In the future, it will support editing for custom engine-specific formats as well.

To run the editor, make sure LOVE2D is installed, and then open tools/editor/heartcore_editor_rN.love.

Note that because SDL2 (and by extension LOVE2D) does not have a way to test for "window damage" events, you may experience display issues unless you have desktop effects enabled.

SNESGSS Tracker

This third-party tool lets you author content for the audio driver. You can find it at tools/snesgss/. If you are on Linux you will need to use Wine. Documentation is provided in a separate readme.txt.

You should use the first six channels for music, and channels 7 and 8 (named A and B in-engine) for sound effects.

When you export your project, you will end up with several files. The only important files are spc700.bin and the various music_X.bin files - the rest are not compatible with the ca65 assembler and should be ignored. Copy these .bin files into your data/audio/ folder. To include the driver and music files, use code similar to the following:

.export SpcDriver

.segment "DATA3"

SpcDriver:
    .incbin "data/audio/spc700.bin"
Music1:
    .incbin "data/audio/music_1.bin"

Examples

You can find several examples inside the examples/ folder to help you get started:

The helloworld example is a relatively barebones example that shows you how to get basic graphics on-screen. It uses a simple scene which uploads a palette, char graphics, and a tilemap during initialization. The scene's thinker merely turns on the screen and does not actually perform any object processing logic. A good place to start if you're new to the framework.

The walker example shows you how to create a basic player-controlled character who can walk around the screen. You will learn how to respond to controller input, move and animate a character object, have the character face different directions, and play sound effects.

The walker-dynamic example is based off of the walker example. This time, however, the sprite graphics are streamed into VRAM instead, saving a significant portion of VRAM. This is most likely what you want if you're making a player character.

The bounceballs example features up to 128 moving objects bouncing around the screen at once. You will learn how to dynamically create and destroy objects. This example also serves as a benchmark, showing you the current CPU usage.

Performance Tips

Currently, sprite handling is one of the most performance-intensive operations, and I don't see this changing anytime soon. While I have done my best to try and optimize the sprite code while still keeping it flexible, you will need to make sure you're not calling AddSprite when you don't need to. For objects using metasprites, this is even more important, as 128 objects each displaying 4-sprite metasprites may run the AddSprite code up to 512 times, which is guaranteed to slow your game down.

So what exactly is happening? Well, the OAM format only has nine bits for the X coordinate, and eight bits for Y, so we need to ensure that each sprite we want to display is actually on-screen. Before this, of course, we avoid doing any work if we know that we have already filled up the OAM with 128 sprites.

The result is that if objects are attempting to draw way more than 128 sprites, many of these sprites won't show up. But as objects leave the screen, CPU usage actually increases! How can that be? Well, normally we don't have to do the range test on much more than 128 sprites, because we hit the 128 sprite limit early. But now that some are off-screen, we end up testing the range of many more sprites before we hit the sprite limit! Therefore, make sure you either destroy or prevent drawing of objects that are outside the screen.

Note that AddSprite does its own on-screen check - if any hardware sprite is found to be completely outside the screen, the CPU overflow bit (v) is set. Therefore, to save CPU time, you should only perform the deactivation check under this condition.

Everything else mostly boils down to common sense. Avoid doing more work than you need to. Pay special attention to routines that are called frequently, as they are the most important to optimize. And remember that register and ROM accesses are faster than RAM, so avoid doing unnecessary loads and stores. If you're writing a function that takes a lot of parameters, consider instead having it take a pointer (via index register) to a data structure.

Troubleshooting

The game is displaying an error message saying that it broke.

The CPU executed a BRK instruction (byte value of 0). This indicates a problem in your code. There are several common causes for this error:

In some cases you may be able to set an execute (X) breakpoint within the range just before the faulting address to get some context for the error. If this doesn't work, try digging through your stack to determine what may have led up to the error.

The game locked up for four seconds, and then told me that the "audio driver has a sleep".

The main CPU tried to communicate with the audio CPU and either it hasn't responded or it failed to complete the transaction in a timely manner. This can happen if you try to send audio driver commands (SpcSendCommand) before the audio driver has been started or even uploaded, or if you try to start an upload when the SPC is not waiting for an upload. In extreme cases it may indicate an audio driver crash.

The game locked up and isn't displaying an error on-screen.

Your program may be stuck executing an infinite loop. Double-check any iterator code to make sure that your variable(s) are being stepped correctly, and that the exit condition is eventually satisfied. Pausing execution via the debugger will very quickly help you diagnose the issue in this case. Of course, it could also be that the CPU hit a BRK and the game was somehow unable to display the error for you.

Known Issues

Legal

The source code to Superfeather is provided under the zlib License. See LICENSE for more details.

Trademarks belong to their respective owners. The use of any trademarks is for informational purposes only and does not imply endorsement from the owners of the trademarks in question.