Writing to Microchip's PIC Internal EEPROM Without Waiting

Background

I've been working on a project where I'd like to log statistical information about power-ons and run-time to the internal EEPROM on a Microchip PIC 12F683. I don't do much advanced programming so I'm using an old version of MPLAB (v8.88) with a PICSTART+ clone by Olimex and the CCS C Compiler version 4. I realize that combination of tools is not perfect, but it suits my needs.

I have had success using CCS's write_eeprom() functions. But first, let's go back a bit.

Reading the EEPROM data from a PIC is fairly straightforward: load the EEPROM address into the EEADR register(s) (just one byte to address the 256 bytes available on the little '683), set EECON1 bit 0 ("RD" bit) to 1, then read the data out of the EEDAT.

Writing, however, is a bit more complex. In the PIC 12F683 manual, there is a snippet of code to accomplish this[1. PIC12F683 Data Sheet, Microchip, 2007.]:

BANKSEL EECON1 ;
BSF EECON1,WREN ; Enable write
BCF INTCON,GIE ; Disable INTs
BTFSC INTCON,GIE ; See AN576
GOTO $-2 ;
MOVLW 55h ; Unlock write
MOVWF EECON2 ;
MOVLW AAh ;
MOVWF EECON2 ;
BSF EECON1,WR ; Start the write
BSF INTCON,GIE ; Enable INTS

The snipped highlights the part where 0x55 and 0xAA are loaded into EECON2 just prior to starting the write. This is the focus of most problems people have getting the EEPROM to write. But I was seeing something different.

The code that CCS C generates is basically the same:

MOVF INTCON,W
MOVWF intcon_temp
BCF INTCON.GIE
BSF STATUS.RP0
MOVLW 0x55
MOVWF EECON2
MOVLW 0xAA
MOVWF EECON2
BSF EECON1.WR
BTFSC EECON1.WR
GOTO $-1
BCF EECON1.WREN
BCF STATUS.RP0
MOVF intcon_temp,W
IORWF INTCON,F

It is not as careful about clearing the general interrupts (more on that in a minute), but it does wait for the write to complete before clearing the EECON1 write enable (WREN), and is clever about resetting the GIE by saving and restoring the whole INTCON register (although it's ultimately wasting a RAM location if you, as the programmer, know what state the GIE should be and can just BSF it if necessary). The big problem I have is that it takes around 5ms for the EEPROM to perform a write—and the write_eeprom() function blocks interrupts during that time. My application is to generate a precise-as-possible ~5KHz square wave using the timer interrupts, so the result of that delay is the clock "hangs" for 5ms each time a byte is written to the EEPROM. That's no good for me.

Now a little aside … I thought the Microchip method of clearing GIE was interesting. Their Application Note #576 outlines the reason for this:

To disable all interrupts, either the Global Interrupt Enable (GIE) bit must be cleared or all the individual interrupt enable bits must be cleared. An issue arises when an instruction clears the GIE bit and an interrupt occurs "simultaneously". For example, when a program executes the instruction BCF INTCON, GIE (at address PC), there is a possibility that an interrupt will occur during this instruction. If an interrupt occurs during this instruction, the program would complete execution of this instruction, and then immediately branch to the user’s interrupt service routine. This occurs because the GIE bit was not clear (disabled) when the interrupt occurred. Normally at the end of the interrupt service routine is the RETFIE instruction. This instruction causes the program to return to the instruction at PC + 1, but also sets the GIE bit (enabled). Therefore the GIE bit is not cleared as expected, and unintended program execution may occur.

They offer 4 workarounds in the application note. The one suggested in the data sheet is to keep clearing GIE and wait for it to stick.

The reason for all this hoopla is because the EEPROM module has some protection against runaway code writing over any good data in the EEPROM, and otherwise banging it to an early death:

  1. EEADR and EEDAT must be filled with the address and data to write.
  2. EECON1.WREN must be 1
  3. The sequence 0x55 then 0xAA must be written to EECON2, and then EECON1.WR must be set to 1 … umm … "quickly". I haven't looked very hard, but I haven't seen a specification for this.

If I were implementing the EEPROM module, I'd pick some small number of instruction clocks from the time 0x55 is written to EECON2 to the completion of the sequence of events. Written as pseudocode, something like:

Wait for 0x55 == EECON2.
Set count to 8.
If 0xAA != EECON2, decrement count; if 0 == count, goto start.
If 0 == EECON1.WR, decrement count; if 0 == count, goto start.
Start writing the EEPROM.

I do not believe there is any kind of program-memory reader in place that checks for the exact sequence of opcodes that forms:

MOVLW 0x55
MOVWF EECON2
MOVLW 0xAA
MOVWF EECON2
BSF EECON1.WR

Nonetheless, there's little reason to deviate from that—save for a compiler that might insert the BSF STATUS.RP0 to access the EECON* register bank after the first MOVLW 0x55. However, I've also read a thread that said that even an additional NOP would thwart the EEPROM write, so keep that segment of code tight.

Down The Wrong Rabbit Hole

I thought I'd get clever and reactivate GIE right after starting the EEPROM write, then let my main program loop twiddle its thumbs waiting for EECON1.WR to clear, all the while allowing the clock to run:

BSF EECON1.WREN
BCF INTCON.GIE
BTFSC INTCON.GIE
GOTO $-2
MOVLW 0x55
MOVWF EECON2
MOVLW 0xAA
MOVWF EECON2
BSF EECON1.WR
BSF INTCON.GIE
BTFSC EECON1.WR
GOTO $-1
BCF EECON1.WREN

But all of a sudden it's dead in the water: the EEPROM stays at its initialized values. Unfortunately, I don't have a way to see if the main loop is stopped as the interrupts keep running and the outputs clock like I expect.

It seems improbable that the insertion of BSF INTCON.GIE between setting EECON1.WR and waiting for it to clear would cause such a problem—especially since it's so similar to Microchip's own code (in that case, adding the code to wait for EECON1.WR to clear).

My debugging now turns to the interrupt handling. All the interrupts funnel into one interrupt handler which just looks at the timer interrupt. I took a closer look at the setup and this is how the CCS C sets things up:

BSF STATUS.RP0 ; enable_interrupts (INT_TIMER2)
BSF PIE1.TMR2IE
MOVLW C0 ; enable_interrupts (GLOBAL);
BCF STATUS.RP0
IORWF INTCON,F

What's interesting to me is that the compiler never explicitly sets PIE1 to anything, so PIE1.EEIE is not explicitly cleared which could be related to the cause of my trouble. Performing an inclusive-OR of 0xC0 = '1100 0000' to INTCON sets both GIE to allow interrupts, and PEIE to permit the peripheral interrupts to fire. Nonetheless, I fixed the code to clear all the peripheral interrupts in my startup code. That didn't work.

As a long-shot, I figured I'd try testing the EECON1.WR right away without doing anything about it. Perhaps there's a bug/quirk in the EEPROM write module that "needs" it to be read for the EEPROM write to proceed? Alas that didn't do it. So I figured I'd change my code to match the CCS compiler and just see if that did the trick: 5ms delays and all. Surprisingly I didn't observe the 5ms delays, but it did write the EEPROM … sometimes. And apparently I've got a bug somewhere that may have to do with byte ordering …

On the other hand, maybe EECON1.WR is cleared too fast after starting a write for some reason and I should be looking to PIR1.EEIF instead. This seemed to work better for me but I don't understand why. Could I have chips with a bug? That sounds extremely unlikely.

The way I got around the potential timing issues of disabling interrupts is to work around the threat in a different way. I'd set up the EEADR and EEDAT registers then set a flag to initiate the EEPROM write. The interrupt handler itself would issue the "magic sequence" and start the write while the main program loop would wait for the flag to clear then proceed with its own wait for the write to finish.

I made a debug block of code that, instead of trying to record legit statistics, just works its way through the EEPROM, loading each address with its address once a second. I let it run for the requisite 255 seconds and checked the results. There were no errors.

Read Errors?

So maybe I have the writing down … perhaps it's in the read? I thought about changing the code to add a delay between setting the EECON1.RD bit and fetching the data, so:

movf data,W
movwf EEADR
bsf EECON1.RD
nop
movf EEDAT,W

but I doubt that would have helped since my code wasn't working well. The first problem I was having—likely "doing wrong"—was to try and increment EEADR. I changed the code to increment a separate register then load EEADR with it didn't seem to work right either. The other potential problem was using a read-and-assign function which would take a reference to a long and then try and fill it in. I was hoping to use the function inline so I'd have access to the register as a constant but the compiler wouldn't let me—the idea was to movf EEDAT,W then movwf variable, and then likewise with variable+1, etc. The compiler, internally, could do this, but it insisted variable was a constant value. And unfortunately it decided to bizarrely use the indirect addressing functionality to add a bunch of code.

I reverted back to using the compiler's built-in read_eeprom() function and I finally met with success. The values I had were updating like I expected. I'll add a bit of code to verify each byte was written correctly (and rewrite indefinitely if necessary), but otherwise I'm confident things are working like I want.

The built-in read_eeprom() function has one additional quirk that my code didn't have: it clears bit 7 of the EECON1 register before setting EECON1.RD. If you're writing your own read routine, that may be worth checking out.

But a False Success

Unfortunately I still got very odd EEPROM behavior. I was finding that sort-of random values were being inserted in the EEPROM. See, I had 4 32-bit registers I kept logging. Seemingly at random, some of the registers were somehow set to register | 0x00000200 although the 0x02 value could appear at any byte, typically the second to lowest. And not always.

I kept working on the EEPROM write function, adding a bit of code that would re-read the value and keep trying to write it back. I also got generous with the write code, as apparently on the 12F683, messing with EECON1, EEADR, or EEDAT before both EECON1.WR cleared and PIR1.EEIF was set could cause write problems.

I also set up the write routine to reset EEADR to 0xFF (which I was not using) so any spurious writes would not affect any valid data.

But I was still getting the same problems. By now I figured the writing was correct and that there must be some error with the registers in RAM. I suspected the read routines so I decided to write my own.

They are particularly vanilla routines, merely reading a byte and putting it away (fetching 4 bytes in a row is left as an exercise to even the most inexperienced programmer). They only have two special features: before initiating a write, it waits for EECON1.WR to be cleared, and after the write is complete, it sets EEADR to 0xFF.

This has mostly cleared up my problems. I still see that spurious 0x02 appear, but it's much less frequently. I have no idea how to proceed from here.

Another Gotcha

Actually, several.

I found a Microchip forum topic that suggested to turn on brown-out reset, since a brown-out can cause random data to be written to EEPROM. It didn't affect my application at all since I was either using a solid 5V supply or the power was shut off and the 5V rail dropped to 0V.

Likewise, another forum topic suggests problems with power-supply decoupling capacitors could be at fault. A saggy rail when writing to the EEPROM could cause problems. Similarly, an inadequate pull-up resistor on the MCLR pin—if configured as a reset—could cause similar problems.

Power Loss Data Integrity

One other thing I realized was if the power was lost while the registers were being written, the code would never know there was a problem. Since I had the space, I made a second mirrored set of registers. Before starting to modify one set, the code sets a flag in the EEPROM by setting one byte to 0xFF. Once it finishes modifying that set, it resets the byte to 0x00. Then it repeats with the other set, setting a different byte to 0xFF, modifying the set, and resetting it to 0x00.

On boot-up, the code checks for the 0x00 in the right place. It uses the values from the first valid set it finds (under the assumption that writing completed successfully.) If both sets are invalid, it just resets all the values as they can't be guaranteed good.

I considered setting up a CRC check but decided it was not that critical. These logging values are not for any mission-critical function, so I didn't care if they got mysteriously garbled.

Conclusions

The PIC internal EEPROM has a number of quirks and requirements that are not immediately obvious:

  1. EEADR and EEDAT are not normal registers and shouldn't be treated as such. For instance, although valid, don't try and EEADR++ or incf EEADR.
  2. Reading data takes a couple instruction cycles, but writing data takes approximately 5ms that varies with temperature and input voltage.

For reading data from the EEPROM:

  1. Wait for EECON1.WR to be cleared in case a write is still in process.
  2. Set EEADR to the address to read.
  3. Set EECON1.RD.
  4. Read the value from EEDAT.
  5. If you are have trouble reading, try a BCF EECON1.7 before the BSF EECON1.RD (maybe only if you're using a PIC12F683). Also, try fetching the value from EEDAT as soon as possible after issuing the read request.
  6. Once done reading set EEADR to a location in EEPROM you are not using in case of spurious writes (particularly during a brownout/power loss.)

For writing data to the EEPROM:

  1. Set EEADR and EEDAT to the address and data.
  2. Wait for EECON1.WR to be 0 to ensure the last write finished.
  3. Set EECON1.WREN to 1 to allow the EEPROM to be written.
  4. Make sure interrupts can't delay execution of the "magic" sequence (either by executing this within a top-priority interrupt routine, or after disabling interrupts):

    MOVLW 0x55
    MOVWF EECON2
    MOVLW 0xAA
    MOVWF EECON2
    BSF EECON1.WR

  5. On some PICs like the 12F683, wait for both EECON1.WR to be cleared and for PIR1.EEIF to be set before changing EECON1, EEADR, or EEDAT.
  6. In theory you can re-enable interrupts and continue running code. But for better EEPROM protection, clear EECON1.WREN as soon as EECON1.WR is 0.
  7. Verify and retry all writes for added assurance.
  8. Once done writing set EEADR to a location in EEPROM you are not using in case of spurious writes (particularly during a brownout/power loss.)

Loading


Time Machine "The backup was not performed …" error

I just started getting a Time Machine error when backing up my Macintosh Mini (mid-2010) running "Snow Leopard" OSX 10.6.8. I get a charmingly cryptic error:

The backup was not performed because an error occurred while copying files to the backup disk.

The problem may be temporary. Try again later to back up. If the problem persists, use Disk Utility to repair your backup disk.

Well I tried repairing the backup drive to no avail. I'm now repairing disk permissions on my main hard drive. But what did help is a Dashboard Widget called Time Machine Buddy. If you run it as an administrator, you can view the backup log and see which file it's having a problem with. To be honest, I have no idea why it is suddenly taking issue with a number of files, but it is.

Upon further investigation, I'm getting an "Error -36" reading the file. This is not good—according to Apple's Common System Error Messages page:

This file is having difficulty while either reading from the drive or writing to the drive. The file may have been improperly written data to the drive or the hard drive or disk may be damaged. This is almost always indicative of a media error (hard error on the disk). Sometimes (rarely) it is transient.

Definitely not good … I guess I'll need to get a new main hard drive. I should be less surprised than I am: I bought the machine in September, 2010, so given how quickly things go bad, 5 years shouldn't be such a surprise. What a pain, though … I'm not looking forward to this.

Update 2015-Dec-3:

I noticed Time Machine started having issues with external drives as well which didn't jibe with a failing system drive. I found another suggestion to do a "full reset" on Time Machine. I shortened the procedure to:

  1. In System Preferences:Time Machine, click Options and remove any drives you can (certain drives are permanently added.) Then turn Time Machine "Off" and quit System Preferences.
  2. Eject the backup drive and power it off.
  3. Delete /Library/Preferences/com.apple.TimeMachine.plist. You'll need to have administrator privileges to do this.
  4. Mount the backup drive, reset any optional exclusions in the Time Machine preferences then start a backup.

I don't know if this will fix my problems—one thing I forgot to do was to repair the external disks (especially those causing problems.)

And another thing: if you don't want to use Time Machine Buddy, you can also review the system.log in the Console app. Again, you'll need administrator privileges to see the file, but you can type "backupd" in the search (actually "filter") box and review what Time Machine—a.k.a. backupd—is doing. I found this more useful than Time Machine Buddy for hardcore debugging—TMB is good to have if you've got an occasional problem and want to take a quick peek at what's up.

I also noticed an error concerning being unable to parse the SystemMigration.log. I found it in Console and it was from when I installed the system software 5 years ago so I deleted it. I'm also running a Disk Utility Repair on all the external drives.

Update 2016-Nov-30:

Almost a year later and I think I may have found a more substantial solution! After another similar failure, I tried switching my backups to use Carbon Copy Cloner from Bombich Software. It's much less integrated than Time Machine, but—after several days of failures—I figured I would at least I would have some semblance of a backup. However, it started giving me messages about filesystem errors, and among their suggestions was one to check the signal connections.

I hadn't thought about this since the days of SCSI, when a poorly constructed cable or a bad enclosure or device, along with minor environmental changes like humidity, temperature, or the phase of the moon would suddenly cause system instability. Having a chain of more than a couple devices, weaving between 25-pin, 50-pin, and 68-pin standards, and deciding whether active or passive termination was the right choice was a nightmare—and all without any diagnostics other than drives failing to mount or disappearing or getting corrupted.

USB seemed to fix all that, but its improvements come at a cost hidden. USB is a far more resilient when it comes to poor connections, but once a certain threshold is crossed, the connection becomes, well, wonky. It's a bit like how digital TV fixed all of analog TV's picture distortion … until the signal gets so poor that digital just quits (although at that point analog is again preferable since it maintained picture—a very noisy picture—but at least it continued to work.)

Anyway, the tl;dr version is to try taking apart your whole USB chain and all the hubs. Install the more critical backup drives closer (fewer hubs) to the computer. Blow dust out of connectors. Once you get the signal integrity improved enough, all the problems seem to go away.

Loading