Learn Protocol Buffers (Protobuf) for serializing structured data — Part 2
In the previous post, we covered Protocol Buffers basics of it. In this post, we will discuss advanced concepts of it.
Oneof
- If you have a message with many fields and where at most one field will be set at the same time, you can enforce this behavior and save memory by using the oneof feature.
- The oneof fields cannot be map and repeated.
- Setting a oneof field will automatically clear all other members of the oneof. So if you set several oneof fields, only the last field you set will still have a value, and all other fields will be null.
Maps
- Maps can be used to map scalers (except float & double) to values of any type
- Map fields cannot be repeated
- Map items are not ordered
Well Known Types
Protocol Buffers contain a set of Well Known Types like Timestamp, Duration, etc which are advanced types known to all programming languages. You can refer to these well-known types here.
Packages
You can also have different types in different .proto files which are useful if you want to re-use code and import other .proto files created by someone else.
With package keywords, you can get your code compiled and placed at the package indicated. It also helps prevent name conflicts between messages.
Options
- Options allow altering the behavior of the protoc compiler when generating code for specific languages.
Here are a few of the most commonly used options:
- java_package — The package you want to use for your generated Java classes.
- java_outer_classname — The class name for the wrapper Java class you want to generate.
- java_multiple_files — If true, separate .java files will be generated for each of the Java classes.
Updating Protocols
There is always a need to update the protocol as new requirement comes and old ones become obsolete.
When you first declare a message in your protocol, you have defined sets of requirements. But as time go on, your business will evolve and you may have a different set of requirements. Some fields may change, some fields may be added and others are removed. So you need to be able to evolve the source data without breaking the other applications regarding it.
So your code should be forward and backward compatible as changes are incorporated.
Rules for Updating Protocol
- Don’t change the numeric tags for any existing fields.
- You can add new fields and the old code will just ignore them.
- Similarly, if the old/new code reads unknown data, the default value is picked up. The default value should always be interpreted with care.
- Fields can be removed, as long as the tag number is not used again in your updated message type. You may want to rename the field instead, perhaps adding the prefix “OBSOLETE_” or making the tag reserved. So that future users of your .proto can’t accidentally reuse the number.
Adding Fields
Let’s add a field in our schema with a new tag number
If the new field is sent to the old code, the code will not know what that tag number corresponds to and the field will be ignored. And if we read old data with new code, the new field will not be found and the default value is assumed.
Renaming Fields
Let’s rename an existing field in our schema
In this case, nothing changes, as fields names can be freely changed and schema depends on tag number and not on tag name.
Removing Fields
Let’s remove a field in our schema
If the old code does not find the field anymore, the default value will be used. And if we read old data with the new code, the deleted field will be dropped.
Reserved Tags
When removing a field, you should always reserve the tag and the name.
This prevents the tag to be re-used and this prevents the name to be re-used. This is necessary to prevent conflicts in the codebase. You can reserve multiple tag numbers and tag names in the same reserved statement.
We reserve tag numbers to prevent new fields from re-using tags and we reserve tag names to prevent code bugs.
Never remove any reserved tags from the message.
You learned how to structure your protocol buffer data using protocol buffer language, including how to generate data access classes from your .proto files with instructions on how to add, rename, remove, and reserve fields to maintain backward and forward compatibility.