I am running a sample pipeline to test what’s diff between StreamSerializer and Portable, found out the StreamSerializer is not much different than Portable, or say: at quite times it is even slower.
Checked with Jet’s team, saying the StreamSerializer will make more sense when getting along with protobuf such high performance library. So tried again, use protobuf in StreamSerializer, the result is amazing!
# test is running locally, 1 jet node # the measurement is the pipeline completion time # with StreamSerializer time cost is 41503ms // just start jet timecostis34506ms // submit job again, run 2nd time timecostis29617ms // submit job again, run 3rd time timecostis32847ms // submit job again, run 4th times timecostis37604ms // restart jet, run again timecostis31349ms // submit job again timecostis32874ms // submit job again timecostis32874ms // restart jet, run again timecostis39657ms // restart jet, run again # with Portable timecostis33360ms // just start jet timecostis31187ms // submit job again timecostis34705ms // restart jet timecostis30896ms // submit job again timecostis31170ms // submit job again timecostis32517ms // restart jet timecostis30114ms // submit job again # with Protobuf timecostis8995ms // cold start timecostis7715ms timecostis7398ms
with StreamSerialzier, I noticed when I put object into the map, it is also touching read function. I am only expecting it will touch write function, the reason is the put function has to return the old value if there is, so will trigger the read fn, can try to use Set<K,V> fn.