24 Mar 2021

读书笔记 - DDIA(1)

Fundation of Database System

Data Models:

  1. Data Models are layererd:

    • Application layer (data structure) -> database layer -> byte layer -> hardware layer
    • Abstractuib layers hide the complexity of the next layer and provide clean doe interface
  2. NoSql vs RDBMS

    • RDBMS has been around for a while, and generalize very well to serve most of the internet use cases; many attempts to overthrown it failed
    • NoSql comes along with a hype, becasue:
      • need for greater scalability, for latrge dataset and throughput
      • free and open source
      • specialized query
      • a more dynamic and expressive data model
  3. Impedence Mismatch: the awkward translation between object in application code and models of table in rdbms

    • ORM exists to hide the boiler code;
  4. Many-to-One and Many-to-Many

    • Advantage of using normalization: deduplication
    • RDBMS supports join easily
    • Document database joins are not needed
  5. Debating on Many-to-Many Solution

    • Network Model: use a pointer as access path, scanning one record a time until find the one looking for
      • Document database reverted bacik to hierarchical model in one aspect: storing the nested records with the parent record, instead of separate table
      • Forieng Key are essentially document reference
  6. Relational vs Document DB:

    1. Documnet DB: schema flexibility, closer to application data structure, performance
    2. Relational: better join, many-to-one and many-to-many

    3. Closer to application datastruture: depends on whther many-to-many is needed (RDBMS is better if needed), Document DB can solve this with denormalization, but this will require constant application code update
    4. Schema Flexbility: Document Database doesn’t have a fixed schema on the database level, but the application code reading the record imply a schema - schema-on-read (dynamic checking); relational databases has an explicit schema on the database level - schema-on-write (compile time checking);
      • ALTER Schema query is typically slow
    5. Data locality for query: Document database store data in continuouns string, Relational Database will require multiple index look up
      • This advantage only apply whe you need large parts of the document at the same time
      • On update, the document usually needs to be rewritten
    6. Skip Data Query

Storage and Retrieval

An idea of how the storage engine is doing under the hood helps tune a storage engine perform well on your kind of workload.

  1. Log Storage: only append new data, only return most recent key-value

    • Hash inex to keep track the offset of the key

    • Log data can be stored into data file segment, which can be closed and perform compaction(deduplication) on these segment
      • During compaction, it is also common for us to merge small old segments into new segment, this can happen in the background while we still serve read and write with the old segment and swithc to the new segment when it is ready.
    • SSTable and LSM Tree:
      • A sorted string table, instead of appending the record immediately to the file, this require the key to be sorted;
        • Use an in-memory tree structure to keep the sorted kv pair
        • When the in-memory tree is large, transfer it to disk
        • When reading, first check in-memory tree, then chekc the most recent SStable
        • Keep an unsorted log on disk to prevent crashing
    • Problems:
      • File Format - CSV has escaping issue, use a binray format (header + raw string)
      • Deleting records - a special deletion record, which will be picked up duting compaction to ignore any kv associated with this deleting record
      • Crash recovery - hash index stored in memory, need snapshot on dist for better recovery
      • Partially written- need a way to detect partially written
      • Concurrency: writes need to happen in strick sequential order
  2. Indexing: an additional structure that is derived from the primary data;

    • Writing with index slow down write
    • Well-chose indexes speed up read queries, but slow down write
  3. B-Tree: instead of vairable size segments, use fixed-size blocksB-Trees

    • Start from a root, which contains some key and references to children;
      • Each child is responsible for a continuous range of keys, keys between references indicate where the boundaries lie
      • branching factor: the number of children a parent can reference
      • when updating: find the key thorugh referencing, and update, the reference are still valid
      • when inserting: find the child whose range enclose the key, if there is no room for inserting, then split it to two half-full tree node; This make sure the tree is always balanced (N key has depth O(lgN)
    • Writting in a B-tree structure, is to overwrite a node on disk with new data; note this is different from LSM tree which is always appending
      • B-tree may need to write to several nodes, especially if you are splitting the node, its parnt needs to update the pointer;
        • This is a dangerous operation; consider when database crush, if the reference is not fully updated, you may end up having an orphan;
        • To avoid crash, we can implement a Write Ahead Log, which require every B Tree operation mush write into this log before the actual operation; This can be used to restore the B tree if database crushes
        • Concurrency is also an issue, as if you have multiple thread updating the B tree, it will mess up the reference easily; so a lightweight lock is needed
  4. Comparing Storage Engine:

    • B-tree is faster in reading (more complex index) while LSM-tree is easier for writing (just appending)
    • Write amplification: write multiple times to disk
      • B-tree does it for WAL and splitting nodes
      • LSM Tree does it for merging and compaction; but it has a higher write throughput for lower write amplification compared to B-tree
    • LSM tree’s compaction can be expensive; if the disk needs to wait for a compaction to finish before writting new data into it, this can effect the througput significantly;
      • if the compaction can not keep up with the writting, the unmerged segment will keep growing, and this slows downt the read as well, because it needs to check more segment; this needs to be optimized
      • B-tree is more predictable
  5. OLAP and OLTP:

    • OLTP - Online Transaction Processing:
      • Scan a small number of records by some key, using index
      • Needs to be highly available and needs to process transaction with low latencykl;’
    • OLAP - Online Transaction Analytics Processing
      • Scan a huge number of records, only reading a few columns per record, and calculate the aggregate statistics
    • They both have a SQL queyr interface, but they are optimnized for different query patterns
    • Star and Snowflake Schemas:

    Stars and Snowflakes: Schemas for Analytics

    ​ store the event as a row in the fact table

    ​ Snowflake is just a variation where the dimension tables are broken down to sub-dimension

    • Columnar storage: store data from same column togethr, instead of data from same row together.

Encoding and Evolution

  1. Encoding: a translation from the in-memory data structure (objects, structs, lists, arrays, hashtable, tree etc) to a byte sequence (JSON documents), reverse is decoding.
    • Language-specific encoding format
      • limits the ability to integate with other systems
      • the decoder can potentially decode any arbitrary code
      • Versioning is aftherthoughts, may compromise forward backward compatibility
      • Effienciency is afterthoughts
    • JSON & XML are popular human readable encoding, despite some flaw (number support, lack of schema, binary string etc)
    • Thrift and ProtoBuf: require schema for data being encoded
      • The binary representaiton of the encoding has a type annotation, length indication and the raw data.
      • Field change: instead of field name, these two store field tag, so adding a new field or removing an old field comes with backward and forward compatability natrually (old code and ignore newly added field, or new code can ignore removed field)
      • Data type change: Can be done, but may lead to truncation or lost of precision
    • Avro: require the code readin and writting uses the same schema (field tag is not stored in the byte sequence)
      • reader schema and writer schema needs to be compatible
      • With a default value, we can achieve backward and forward compatibility (so new reader can read the default by old writer, and new writter remove field while old reader read it as default)
      • Friendly with dynamically generate schema
    • Binary based encoding benefit:
      • more compact
      • documented schema (make sure this is up to date)
      • keep a database of schema help check compatibility
      • code generation for a statically typed programming langauge
  2. Dataflow:
    • Via database: writes to database is encoding, read from database is decoding
    • Via network: service-oriented architecture needs to make the appplication easier to change and maintain by making services independently deployable and evolvable.
      • Rest vs SOAP:
        • Rest utilize HHTP
        • SOAP avoid using HTTP33
    • Async Message-passing: data send through message broker that can be received with low latency
      • message broker
        • serves as a buffer
        • can recover
        • can send to different receiver
        • decouple the sender from the recipient
      • actor framework: this encapsulate the complexity of threads
        • each actor communicates with others by sending and receiving messages

Tags: